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1.  STATEMENT  OF  WORK 


Many  Air  Force  applications  can  be  modeled  as  some  exten¬ 
sion  of  a  pure  network  problem.  These  extensions  may  re¬ 
quire  additional  side  constraints,  arcs  that  involve  attrition  or 
flow  of  multiple  commodities  on  a  single  arc.  In  all  cases,  the 
network  models  require  integer  flows  and  may  be  viewed  as 
special  cases  of  the  integer  programming  model.  Since  some 
of  these  applications  demand  computer  hardware  several  or¬ 
ders  of  magnitude  faster  than  the  fastest  machines  available, 
we  have  investigated  the  use  of  parallelism  to  increase  the 
computational  speed  of  these  algorithms.  Very  powerful 
hardware  (in  terms  of  millions  of  floating  point  operations 
per  second)  can  be  built  using  many  low  cost  standard  chips, 
all  designed  to  operate  in  parallel.  Our  research  program  ob¬ 
jective  is  to  develop  and  empirically  test  new  serial  and  paral¬ 
lel  algorithms  and  software  for  network  based  models.  The 
problems  studied  during  the  past  eighteen  months  include  the 
generalized  network  problem,  the  transportation  problem. 
sparse  and  dense  assignment  problems,  the  one-to-one  short¬ 
est  path  problem  problem,  and  the  singly  constrained  assign¬ 
ment  problem.  Algorithms  for  all  of  these  models  have  been 
developed  and  empirically  tested  on  a  variety  of  sequential 
and  parallel  computers. 
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n.  PUBUCATIONS 


» 


Title 

An  Empirical  Analysis  of  the  Dense  Assignment  Problem: 
Sequential  and  Parallel  Implementations 

Revised  May  1991 


Authors 

J.  Kennington  and  Z.  Wang 


Executive  Summary 

i  We  performed  a  thorough  empirical  analysis  comparing  the 
auction  algorithm  with  the  shortest  augmenting  path  algo¬ 
rithm  (SAP)  for  the  dense  jissignment  problem.  We  found 
»  that  the  software  implementation  of  the  SAP  algorithm  was 
superior  on  serial  machines.  A  parallel  implementation  of 
this  software  )delded  speedups  of  four  using  ten  processors. 
I  We  successfully  solved  problems  having  over  one  million  arcs 
in  less  that  20  seconds  on  a  Sequent  Symmetry  S81. 


Publication  Status 

This  paper  appears  in  the  ORSA  Journal  on  Computing.  3, 
*  299-306,  1991. 
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Title 

The  Singly  Constrained  Assignment  Problem 
Revised  December  1991 


Authors 

J.  Kennington  and  F.  Mohammadi 


Executive  Summaiy 

This  paper  presents  a  new  algorithm  for  the  singly 
constrained  assignment  problem  along  with  an  empirical 
analysis  of  the  software  implementation  of  this  algorithm. 
The  new  algorithm  is  based  on  Lagrangean  duality  theory 
and  involves  solving  a  series  of  pure  assignment  problems. 
The  software  implementation  has  successfully  solved 
problems  having  over  one-half  million  binary  variables  in 
less  than  fifteen  minutes  on  a  Sequent  Symmetry  S81  using  a 
single  processor. 


Publication  Status 

This  paper  has  been  submitted  for  publication  and  is  currently 
under  review. 
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Generalized  Networks:  Parallel  Algorithms  and  an  Empirical 
Analysis 

Revised  January  1991 


Authors 

R.  Clark,  J.  Kennington,  R.  Meyer  and  M.  Ramamurti 


Executive  Summary 

A  generalized  network  problem  is  a  specialization  of  the 
linear  programming  problem  in  which  each  column  of  the 
constraint  matrix  has  at  most  two  nonzero  entries.  It  is  well 
known  that  a  generalized  network  problem  possesses  a  special 
graphical  structure  which  can  be  exploited  in  algorithm 
development.  Specialized  sequential  and  parallel  codes 
which  exploit  this  graphical  structure  have  been  developed 
and  empirically  tested  on  a  Sequent  Symmetry  S81. 


Publication  Status 

This  paper  has  been  accepted  for  publication  in  the  ORSA 
Journal  on  Computing. 


Title 


Computational  Comparison  of  Sequential  and  Parallel 
Algorithms  for  the  One-To-One  Shortest-Path  Problem 

Revised  January  1992 


Authors 

R.  Helgason,  J.  Kennington,  and  B.  Stewart 


Executive  Summary 

The  problem  of  finding  the  shortest  path  between  a 
designated  pair  of  nodes  is  a  fundamental  problem  in 
operations  research  which  also  serves  as  a  building  block  for 
other  algorithms.  The  classical  Dijkstra  algorithm  begins  at 
one  of  the  designated  nodes  and  fans  out  from  this  node  until 
the  other  designated  node  becomes  a  member  of  the  labeled 
set.  In  this  paper  we  empirically  demonstrate  that  a  better 
algorithm  is  obtained  by  a  procedure  that  begins  at  both 
designated  nodes  and  fans  out  in  both  directions  either 
simultaneously  in  parallel  or  alternately  in  series. 


Publication  Status 

This  paper  has  been  submitted  for  publication  and  is  currently 
under  review. 
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Title 


The  Shortest  Augmenting  Path  Algorithm  for  the 
Transportation  Problem 

Revised  February  1991 


Authors 

J.  Kennington  and  Z.  Wang 


Executive  Summaiy 

This  study  presents  an  empirical  analysis  of  the  shortest 
augmenting  path  algorithm,  augmented  by  advanced  start 
heuristics,  for  the  transportation  problem.  The  software 
implementation  of  our  algorithm  is  the  best  available  for 
problems  having  a  small  total  supply.  As  the  total  supply 
increases,  the  algorithm  degrades  and  is  computationally 
slower  than  competing  primal  simplex  software. 


Publication  Status 

This  paper  has  been  issued  as  a  technical  report. 
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ABSTRACT 


The  best  algorithms  for  the  dense  assignment  problem  are  acknowledged  to  be  the  auction 
algorithm  and  the  shortest  augmenting  path  algorithm.  In  this  investigation  we  present  an 
empirical  analysis  of  two  of  the  current  best  software  implementations  of  these  two  meth¬ 
ods  on  three  different  serial  machines.  These  software  implementations  were  developed 
by  Bertsekas  of  the  Massachusetts  Institute  of  Technology  and  by  Jonker  and  Volgenant  of 
the  University  of  Amsterdam.  This  report  is  an  independent  evaluation  of  the  software 
implementation  of  these  two  algorithms.  For  the  sample  of  problems  examined  and  the 
sample  of  hardware  used  (IBM  3081D,  Sequent  Symmetry  S81,  and  VAX  750),  we  found 
that  the  shortest  augmenting  path  algorithm  was  the  fastest.  We  also  report  our  empirical 
results  with  a  parallel  version  of  the  shortest  augmenting  path  algorithm.  On  1200x1200 
dense  assignment  problems,  speedups  of  approximately  four  were  achieved  using  ten 
processors.  Million  arc  problems  were  solved  in  less  than  twelve  seconds  on  a  Sequent 
Symmetry  S81  with  the  parallel  shortest  augmenting  path  algorithm. 
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The  classical  assignment  problem  (also  known  as  the  weighted  bipartite  matching 
problem'!  is  to  assign  n  men  to  n  distinct  jobs  so  that  the  total  cost  of  as^’^nment  is 
minimized.  Mathematically,  this  may  be  formulated  as  the  following  special  mathemati¬ 
cal  program: 


minimize 

M 

J? 

X 

j.  j 

subject  to: 

j  =  l»-..in 
i 

2  Xy  =  1,  i  =  l....,n 

j 

Xy  G  {0,1},  (all  i,j) 

where  Cy  denotes  the  cost  for  assigning  man  i  to  job  j  and  Xjj  =  1  implies  that  man  i  is 
assigned  to  job  j.  Due  to  the  total  unimodularity  of  the  constraint  matrix,  this  problem 
can  be  solved  by  the  simplex  algorithm  and  every  basic  solution  will  have  Xy  G  {0, 1}  (see 
Kennington  and  Helgason  [1980]).  Hence,  classic  linear  programming  duality  theory  and 
Kuhn-Tucker  optimality  conditions  can  be  used  in  algorithm  development  for  this  prob¬ 
lem. 

Specialized  algorithms  for  the  assignment  problem  can  be  classified  into  five  catego¬ 
ries  as  follows: 

(i)  maximum  flow  (primal-dual), 

(ii)  primal  simplex, 

(iii)  dual  simplex, 

(iv)  auction  algorithm,  and 

(v)  shortest  augmenting  paths. 

The  first  maximum  flow  algorithm  was  developed  by  Kuhn  [1955]  and  is  called  the  Hun¬ 
garian  method.  Its  name  comes  from  the  fact  that  the  algorithm  is  developed  from  results 
of  two  Hungarian  mathematicians.  Variations  were  presented  by  Kuhn  [1956].  Derigs 
[1985]  shows  that  the  shortest  augmenting  path  method  can  be  viewed  as  a  special  imple- 
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mentation  of  the  Hungarian  method.  Ahuja,  Magnanti,  and  Orlin  [1989]  call  the  Hungar¬ 
ian  method  a  primal-dual  variant  of  the  successive  augmenting  path  algorithm. 

A  specialized  primal  simplex  algorithm  for  the  assignment  problem  was  developed  by 
Barr,  Glover  and  Klingman  [1977].  Their  method,  called  the  alternating  basis  algorithm, 
only  considers  a  subset  of  the  possible  bases.  This  idea  was  further  exploited  by  Hung 
[1983]  in  his  development  of  a  polynomial  simplex  algorithm  for  this  problem.  An  exten¬ 
sive  computational  study  comparing  the  Hungarian  algorithm  with  primal  simplex  meth¬ 
ods  was  performed  by  McGinnis  [1983]. 

A  dual  polynomial  simplex  method  known  as  the  signature  method  has  been  devel¬ 
oped  by  Balinski  [1985,  1986].  Extensions  for  the  sparse  assignment  problem  were  devel¬ 
oped  by  Goldfarb  [1985],  and  the  relationship  between  the  signature  method  and  the 
shortest  augmenting  path  method  was  presented  by  Derigs  [1985].  Another  variation  of 
this  algorithm  has  been  developed  by  Akgul  [1988]. 

Bertsekas  [1979,  1981,  1988]  and  Bertsekas  and  Eckstein  [1988]  give  a  complete  theo¬ 
retical  development  of  the  auction  algorithm.  This  algorithm  critically  depends  on  the 
ideas  of  e -complementary  slackness  and  adaptive  scaling.  A  recent  version  of  Bertsekas’ 
auction  code  was  completed  in  June  1988  and  has  been  placed  in  the  public  domain.  Barr 
and  Christiansen  [1989]  have  experimented  with  a  parallel  version  of  this  algorithm  writ¬ 
ten  in  C++  on  the  Sequent  Symmetry  S81,  and  Phillips  and  Zenios  [1989]  have  experi¬ 
mented  with  this  algorithm  on  the  Connection  Machine.  Perry  [1988]  experimented  with 
a  parallel  version  of  the  auction  algorithm  on  both  the  Alliant  FX/8  and  the  Sequent 
Symmetry  S81. 

Hung  and  Rom  [1980]  presented  a  shortest  augmenting  path  method  which  had  both  a 
polynomial  bound  and  good  computational  results.  Other  variants  of  the  shortest  aug¬ 
menting  path  method  have  been  presented  by  Glover,  Glover  and  Klingman  [1986]  and 


Jonker  and  Volgenant  [1987].  An  extensive  computational  study  with  the  shortest  aug¬ 
menting  path  method  may  be  found  in  Derigs  [1985]. 

Recently,  scaling  based  algorithms  have  been  presented  for  the  assignment  problem 
(see  Gabow  [1985]  and  Orlin  and  Ahuja  [1988]).  These  algorithms  are  derivatives  of  the 
Hungarian  method  and  the  auction  algorithm,  respectively,  with  the  added  feature  of  data 
scaling.  Bertsekas  is  using  a  similar  idea  in  his  June  1988  auction  code. 

There  are  three  ways  to  analyze  the  performance  of  an  algorithm:  worst-case  analy¬ 
sis,  average  case  analysis,  and  empirical  analysis.  The  worst-case  analysis  results  for  the 
assignment  problem  are  presented  in  Ahuja,  Magnanti,  and  Orlin  [1989].  The  objective 
of  our  study  is  to  present  an  empirical  analysis  of  two  of  the  fastest  serial  codes.  These 
codes  were  obtained  directly  from  the  authors,  and  they  represent  the  current  best  soft¬ 
ware  implementation  of  the  top  competing  algorithms.  They  were  run  on  three  different 
machines  (IBM  3081D,  Sequent  Symmetry  S81,  and  VAX  750)  to  allow  for  an  analysis 
with  respect  to  differences  in  machine  architecture  and  compiler.  The  shortest  augment¬ 
ing  path  code  was  then  parallelized  and  the  speedup  achieved  on  a  shared  memory  multi¬ 
processor  was  reported. 

1.  SEQUENTIAL  CODES 

Five  algorithms  for  solving  dense  assignment  problems  have  been  implemented  and 
computationally  compared  by  various  researchers.  We  are  not  aware  of  any  computa¬ 
tional  studies  involving  the  dual  algorithms.  Derigs  [1985]  shows  that  a  shortest  augment¬ 
ing  path  implementation  is  superior  to  a  Hungarian  implementation.  This  has  been  con¬ 
firmed  by  Jonker  and  Volgenant  [1987]  and  by  the  authors  in  a  comparison  with  the 
codes  of  Jonker  and  Volgenant  [1987]  and  Rardin  [1986].  Glover,  Glover  and  Klingman 
[1977]  found  that  their  shortest  augmenting  path  code  was  superior  to  the  specialized 


simplex  code  of  Barr,  Glover  and  Klingman  [1977].  The  authors  have  confirmed  this  with 
a  comparison  of  the  Jonker  and  Volgenant  [1987]  and  Barr,  Glover  and  Klingman  [1977] 
codes.  Jonker  and  Volgenant  [1987]  also  concluded  that  their  dense  shortest  augmenting 
path  code  was  superior  to  the  auction  code  of  Bertsekas  [1981]. 

After  many  studies  over  a  fifteen  year  period,  it  is  acknowledged  that  the  two  best 
algorithms  for  dense  assignment  problems  are  the  auction  algorithm  and  the  shortest 
augmenting  path  algorithm.  One  of  the  best  software  implementations  of  the  auction 
algorithm  is  the  code  of  Bertsekas  (Version  1.0,  June  1988).  Some  of  the  other  auction 
codes  are  very  sensitive  to  the  cost  structure  and  degrade  as  the  cost  range  becomes 
larger.  These  other  codes  sometimes  work  very  well  for  small  cost  ranges  and  fail  for 
cost  ranges  as  small  as  [0,1000].  By  the  use  of  adaptive  scaling,  Bertsekas’  code  works 
well  for  both  a  small  cost  range  and  a  large  cost  range.  This  code  scales  all  cost  data  by 
n+1  and  solves  a  sequence  of  problems  with  decreasing  values  of  the  stopping  criterion. 
All  calculations  are  performed  in  integer  arithmetic.  Two  new  parallel  auction  codes 
which  do  not  use  adaptive  scaling  may  be  found  in  Kempka,  Kennington,  and  Zaki 
[1991].  An  excellent  empirical  analysis  comparing  a  parallel  version  of  the  auction  algo¬ 
rithm  with  a  parallel  version  of  the  Jonker-Volgenant  code  on  an  Alliant  FX/8  may  be 
found  in  Zaki  [1990]. 

We  believe  that  the  best  implementation  of  the  shortest  augmenting  path  algorithm  for 
serial  machines  was  developed  by  Jonker  and  Volgenant  [1987].  Dijkstra’s  algorithm  is 
used  to  obtain  the  shortest  augmenting  paths  and  only  integer  arithmetic  is  required.  The 
code  also  incorporates  an  elaborate  pre-processing  stage  which  greatly  reduces  the  total 
number  of  times  that  the  Dijkstra  algorithm  is  required.  It  also  uses  a  clever  data  struc¬ 
ture  for  updating  the  dual  variables  after  a  shortest  augmenting  path  has  been  found.  The 
code  maintains  dual  variables  and  the  reduced  costs  are  calculated  as  required.  Both 
codes  are  written  in  standard  FORTRAN. 
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The  empirical  results  of  our  experiment  are  presented  in  Table  1  and  Figure  1 .  All 
times  exclude  input  and  output  but  include  the  pre-processing.  For  all  test  problems,  all 
cost  ranges,  and  all  machines,  the  shortest  augmenting  path  code  dominated  the  auction 
code.  The  auction  code  had  the  greatest  difficulty  when  the  cost  range  was  the  smallest, 
i.e.  [0,100].  When  the  cost  range  was  at  least  [0,1000],  the  auction  algorithm  was  af¬ 
fected  very  little  by  the  cost  range.  The  shortest  augmenting  path  code  was  adversely 
affected  by  an  increasing  cost  range.  The  machine  type  affected  the  empirical  analysis. 
On  the  Sequent,  the  shortest  augmenting  path  code  was  4.41  times  faster  than  the  auction 
code,  on  the  IBM  it  was  3.87  times  faster,  while  on  the  VAX  it  was  2.79  times  faster. 
This  confirms  our  belief  that  the  comparative  performance  of  two  codes  is  intimately 
linked  to  the  hardware,  operating  system,  and  compilers  used.  For  our  tests  the  IBM  was 
running  CMS  and  both  the  Sequent  and  the  VAX  were  running  UNIX^t.  Our  results 
contradict  the  widely  held  belief  that  the  auction  algorithm  converges  faster  for  lower  cost 
ranges.  Bertsekas’  auction  code  for  dense  problems  was  not  sensitive  to  changing  the  cost 
range  from  [0,1000]  to  [0,100000].  In  fact  the  shortest  augmenting  path  code  is  much 
more  sensitive  to  the  larger  cost  ranges  than  the  auction  code  is.  We  also  observed  the 
well-known  phenomenon  of  the  auction  algorithm  that  a  significant  amount  of  the  compu¬ 
tational  time  is  spent  attempting  to  complete  the  last  few  assignments.  This  is  in  contrast 
to  the  shortest  augmenting  path  algorithm  that  achieves  one  more  assignment  with  each 
application  of  Dijkstra’s  algorithm.  Each  application  of  the  shortest  path  algorithm  can 
be  very  expensive,  but  it  is  guaranteed  to  result  in  one  more  assignment.  This  feature 
along  with  the  extensive  pre-processing  to  obtain  a  good  set  of  partial  assignments  makes 
this  approach  work  extremely  well. 

t  UNIX  is  a  trade  mark  of  AT«feT  Bell  Laboratories. 
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We  also  developed  a  modification  of  the  shortest  augmenting  path  code  which  used  a 
Dijkstra  two-tree  shortest  path  algorithm  (see  Helgason,  Kennington  and  Stewart  [1988]), 
I  but  that  system  was  not  competitive  with  the  original  shortest  augmenting  path  implemen¬ 

tation.  At  the  termination  of  the  classical  Dijkstra  shortest  path  algorithm,  all  the  infor¬ 
mation  required  to  update  the  duals  is  available  and  the  dual  update  can  be  executed  very 
I  efficiently.  The  two-tree  Dijkstra  method  can  obtain  the  shortest  augmenting  path  faster 

than  the  classical  Dijkstra  shortest  path  method;  however,  additional  work  is  required  to 
discover  which  duals  must  be  changed  and  by  what  amount.  The  overhead  required  for 
i  the  dual  variable  updates  exceeded  the  potential  benefits  of  the  two-tree  Dijkstra  method 

for  finding  the  shortest  augmenting  path. 


i 
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2.  A  PARALLEL  SHORTEST  AUGMENTING  PATH  CODE 

This  section  contains  a  description  of  the  algorithm  followed  by  an  empirical  analysis 
which  identifies  the  computationally  expensive  steps  of  the  software  implementation  of 
this  method.  By  using  prescheduled  data  partitioning,  the  operations  of  these  computa¬ 
tionally  expensive  steps  were  allocated  among  multiple  processors  and  speedup  was 
measured  for  computational  experiments  using  up  to  ten  processors. 

The  algorithm  of  Jonker  and  Volgenant  [1987]  may  be  divided  into  three  procedures 
as  follows: 

(i)  column  reduction, 

(ii)  augmenting  row  reduction,  and 

(iii)  augmentation  using  a  shortest  path  procedure. 

Procedures  (i)  and  (ii)  are  heuristics  which  provide  an  advanced  starting  solution  for 
procedure  (iii)  which  is  an  exact  method.  The  details  of  the  algorithms  are  given  below. 
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THE  SHORTEST  AUGMENTING  PATH  ALGORITHM  (SAP) 
Input: 

1.  The  problem  size,  n. 

2.  The  nxn  cost  matrix,  c(i,  j). 

Output: 

1.  x[i]  =  j  implies  that  man  i  is  assigned  to  job  J. 

2.  y[j]  =  i  implies  that  job  j  is  assigned  to  man  i. 

3.  v[j]  denotes  the  dual  variable  associated  with  job  j. 
begin 

call  procedure  COLUMN  REDUCTION 
call  procedure  AUGMENTING  ROW  REDUCTION 
call  procedure  BUILD  A  TREE 
end 
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procedure  COLUMN  REDUCTION 
begin 

1.  x[i]  0,  i  =  1,  ....  n; 

2.  for  j  =  1,  n 

3.  fx*-  min  {c[i,  j]  :  i  =  1,  ....  n}; 

4-  v[j]  ♦-  and  let  i*G  {i:  c[i,  j]  =  /r  }; 

5.  if  x[i*]  =  0,  then  x[i*]  j,  y[j]  i*; 

6.  end  for 

7.  for  i  *  1,  ,,,,  n 

8.  if  x[i]  p*  0,  then 

9-  fx*-  min  {c[i,  j]  -  v[j]:  j  =  1,  ....  n  and  j  ^  x[i]}; 

10.  v[x[i]]  v[x[i]]  -  fx\ 

11.  end  if 

12.  end  for 

end 
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procedure  AUGMENTING  ROW  REDUCTION 


begin 

13.  1  ^  0,  t  *-  0; 

14.  for  i=l,  ....  n 

15.  if  x[i]  =  0,  then  1  *-  1+1,  f[l]  i; 

16.  end  for 

17.  if  1  =  0,  then  terminate  with  an  optimum; 

18.  m  *-  1,  k  1,  1  ♦-  0; 

19.  i  *-  f[k],  k  *-  k+1; 

20.  Ui  ♦-  min{c[i,  j]  -  v[j]:  j=l,  ....  n},  let  ji  G{j:  c[i,  j]  -  v[j]  =  Ui}, 

U2  ♦-min{c[i,  j]-  v[j]:  j=l,  ...,  n  and  j  ^  ji},  let  }2  G  {j:  c[i,  j]  -  v[j]  =  U2 
and  j  5^  ji }; 

21.  ii  y[  ji]; 

22.  if  Ui  <  U2  ,  then  v[  ji  ]*-  v[  ji  ]  +  Ui  -  U2  ; 

23.  else  if  ii  =  0,  then  go  to  26,  else  ji*-  k,  ‘i  *“y[  ii); 

24.  if  ii  =  0,  then  go  to  26; 

25.  if  Ui  <  U2  ,  then  k  ^k-1,  f[k]  ir,  else  1  *-1+1,  f[l]*-  ii; 

26.  x[i]  *-  ji,  y[  jil*-i; 

27.  if  k  s  m  go  to  19; 

28.  t  *-  t+1; 

29.  if  1  >  0  and  t  <2,  then  go  to  18; 
end 
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procedure  BUILD  A  TREE 
begin 

30.  m-^1 

31.  for  1  =  1,  ....  m 

32.  i*  --  f[l]; 

33.  READY  —  4),  TODO  {1,  ....  n},  RSINK  —  {j:  yO]  =  0}; 

34.  dU]  *-  c[i*,  j]  -  v[j],  pred[j]  ♦-i*.  j=l,  ....  n; 

35.  II  min{d[j]:  jG  TODO},  SCAN  {j:  dD]  =  //.  jS  TODO),  TODO  — 
TODONSCAN; 

36.  if  SCAN  n  RSINK  ^  then  go  to  45; 

37.  for  all  ji  G  SCAN 

38.  i  ^  y[  ji  ],  h  c[i,  ji  ]  -  v[  j,  ]  -  /t; 

39.  for  all  jG  TODO 

40.  p  c[i,  j]  -  v[j]  -  h; 

41.  if  p  <  d[j],  then  d[j]  ^  p,  pred[j]  ♦-i; 

42.  end  for 

43.  READY  *-  READY  IJ  { ji  ); 

44.  end  for 

45.  go  to  35 

46.  v[j]  v[j]  +  d[j]  -  II,  for  all  JG  READY; 

47.  letjG  SCAN  fl  RSINK 

48.  i  ♦-  pred[j],  y[j]  *-  i,  k  •*-},  j  •^x(i],  x[i]  ^k; 

49.  if  i  ^  i*  go  to  48; 

50.  end  for 
end 
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Sixty  test  problems  were  run  and  the  percentage  of  computer  time  allocated  to  each  of 
these  procedures  is  tabulated  in  Table  2.  All  runs  were  made  on  a  Symmetry  S81  from 
Sequent  Computer  Systems,  Inc.  This  Symmetry  S81  is  a  multiprocessor  system  with  32  • 

Mbytes  of  shared  memory  and  twenty  Intel  80386  cpu’s.  For  this  study,  all  codes  used 
only  integer  arithmetic  and  did  not  make  use  of  math  co-processors.  Note  that  for  these 
problems,  an  average  of  18%  of  the  time  was  consumed  by  the  column  reduction,  22%  • 

was  consumed  by  the  augmenting  row  reduction,  and  60%  by  the  tree  building  activities. 

Within  these  three  procedures,  steps  3,  9,  20,  35,  and  39  through  42  are  the  most  time 
consuming.  The  percentage  of  time  allocated  to  these  steps  is  tabulated  in  Table  3,  with  • 

over  one-half  of  the  computer  time  attributed  to  steps  39  -  42. 


By  using  prescheduled  data  partitioning  on  steps  3,  9,  20,  and  39-42,  a  parallel  SAP 
code  was  developed.  This  software  runs  in  sequential  mode  until  one  of  the  key  steps  (3, 
9,  20  or  39)  is  reached.  Each  of  these  steps  is  executed  in  parallel  followed  by  a  process 
synchronization  and  a  return  to  sequential  mode.  The  objective  is  to  minimize  the  over¬ 
head  for  parallel  processing.  The  computational  times  are  shown  in  Table  4  with  the 
corresponding  speedups  presented  in  Table  5.  As  shown  in  Table  5  under  the  column 
entitled  1  cpu,  the  overhead  for  parallel  processing  is  approximately  20%.  This  means 
that  the  speedup  is  limited  to  at  most  5  and  the  actual  speedups  using  ten  processors 
ranged  from  a  low  of  3.27  to  a  high  of  4.32.  As  expected,  the  speedups  improved  as  the 
problem  size  increased  (see  Figure  2).  The  speedups  were  also  slightly  better  for  the 
fifteen  problems  having  the  smallest  cost  range.  This  is  due  to  the  fact  that  the  pre-proc¬ 
essing  heuristics  are  more  effective  on  these  problems  as  shown  in  Table  2  and  the  pre¬ 
processing  steps  require  less  overhead  for  parallel  processing  than  does  the  tree  building 
procedure.  We  also  parallelized  step  35;  however,  the  synchronization  required  for  updat¬ 
ing  the  SCAN  and  TODO  lists  exceeded  the  benefit  of  the  parallelization. 
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For  the  problem  sizes  and  cost  ranges  analyzed,  the  times  in  Table  4  are  the  best 
times  that  we  have  seen  on  the  Symmetry  S81.  The  parallel  shonest  augmenting  path 
code  is  a  powerful  tool  that  can  easily  solve  all  1,000,000  arc  dense  assignment  problems 
in  less  than  sixteen  seconds  using  six  processors  and  less  than  twelve  seconds  using  ten 
processors. 

Experiments  with  a  parallel  version  of  the  auction  algorithm  on  a  Sequent  Symmetry 
S81  may  be  found  in  Barr  and  Christiansen  [1989].  They  used  their  parallel  assignment 
code  to  solve  a  1,000,000  arc  dense  assignment  problem  with  a  cost  range  of  [1,1000]. 
Their  best  time  on  a  Sequent  Symmetry  S81  using  six  processors  exceeded  five  minutes. 
Performance  characteristics  of  the  Jacobi  and  Gauss-Seidel  versions  of  Bertsekas’  auction 
algorithm  on  an  Alliant  FX/8  may  be  found  in  Kempa,  Kennington,  and  Zaki  [1990]. 
They  found  that  the  Gauss-Seidel  version  was  generally  more  efficient  in  the  scalar  envi¬ 
ronment,  but  the  Jacobi  version  was  generally  more  efficient  in  the  parallel  environment. 
The  observed  speedup  due  to  vectorization  was  higher  than  that  due  to  concurrency  for 
both  versions.  The  performance  characteristics  of  several  software  implementations  of 
this  algorithm  for  the  Connection  Machine  CM2  may  be  found  in  Wein  and  Zenios 
[1990a,  1990b].  Their  experimentation  indicated  that  the  auction  algorithm  was  not  well 
suited  for  the  architecture  of  the  CM2.  Bertsekas  and  Castanon  [1989]  compare  a  variety 
of  synchronous  and  asynchronous  implementations  of  the  auction  algorithm  on  the  Encore 
Multimax.  They  found  that  their  asynchronous  implementations  were  superior  to  their 
synchronous  systems. 


3.  SUMMARY  AND  CONCLUSIONS 

The  empirical  analysis  presented  in  this  study  indicates  that  for  dense  assignment 
problems  having  a  size  up  to  800x800,  the  shortest  augmenting  path  software  is  faster 
than  the  auction  algorithm  software.  This  conclusion  was  based  on  test  runs  with  sixteen 


randomly  generated  test  problems  with  four  different  cost  ranges  and  run  on  three  differ¬ 
ent  serial  machines.  Contrary  to  the  widely  held  belief  that  the  auction  algorithm  per¬ 
forms  worse  as  the  cost  range  increases,  we  found  this  not  to  be  the  case.  We  believe  that 
Bertsekas’  Version  1.0,  June  1988  implementation  has  eliminated  this  difficulty  via  the 
use  of  adaptive  scaling.  We  did  observe  the  difficulty  with  the  "end  game”  in  which  an 
inordinate  amount  of  time  is  required  to  complete  the  last  few  assignments.  The  shortest 
augmenting  path  method  has  the  attractive  feature  that  each  time  a  shortest  path  is  calcu¬ 
lated,  one  new  assignment  is  made.  We  found  that  the  shortest  augmenting  path  code 
was  adversely  affected  by  an  increasing  cost  range.  As  the  cost  range  increases,  larger 
trees  must  be  developed  by  Dijkstra’s  algorithm  to  obtain  the  shortest  path  from  an  unas¬ 
signed  man  to  an  unassigned  job. 

By  using  the  technique  of  prescheduled  data  partitioning,  we  parallelized  the  shortest 
augmenting  path  code  of  Jonker  and  Volgenant  [1987]  for  the  Sequent  Symmetry  S81. 
Speedups  of  three  to  four  were  achieved  on  1200x1200  dense  problems  using  ten  proces¬ 
sors.  Remarkably,  1,000,000  arc  dense  assignment  problems  were  solved  using  this  par¬ 
allel  code  in  less  than  twelve  seconds  (wall  clock  time).  Even  though  this  code  was 
developed  for  a  particular  multiprocessor  system  with  shared  memory,  it  can  be  used  with 
any  shared  memory  parallel  processing  system. 
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Table  1.  Comparison  of  Sequential  Algorithms  on  Dense  n  x  n 
Assignment  Problems  (all  times  are  in  seconds) 
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Table  2.  Total  Time  Allocated  to  the  Major  Routines  (all  times  are 
the  average  of  five  dense  n  x  n  assignment  problems) 
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Table  3.  Percentage  of  Total  Time  Attributed  to  the  Most  Time  Consuming  Tasks 
(all  times  are  the  average  of  five  dense  n  x  n  assignment  problems) 
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Table  4.  The  Parallel  Shortest  Augmenting  Path  Code  (all  times  are  the 
average  in  seconds  for  five  dense  n  x  n  assignment  problems) 
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Table  5.  Speedup  for  Dense  n  x  n  Assignment  Problems 
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ABSTRACT 


This  manuscript  presents  a  new  heuristic  algorithm  to  find  near  optimal  integer  solu¬ 
tions  for  the  singly  constrained  assignment  problem.  The  method  is  based  on  Lagran- 
gean  duality  theory  and  involves  solving  a  series  of  pure  assignment  problems.  The 
software  implementation  of  this  heuristic  successfully  solved  problems  having  one-half 
million  binary  variables  (assignment  arcs)  in  less  than  fifteen  minutes  of  wall  clock 
time  on  a  Sequent  Symmetry  S81  using  a  single  processor.  In  no  case  did  our  heuristic 
fail  to  obtain  a  feasible  integer  solution  guaranteed  to  be  within  10%  of  an  optimum. 

In  computational  comparisons  with  MPSX  and  OSL  on  an  IBM  3081D,  the  specialized 
software  was  from  one  hundred  to  one  thousand  times  faster.  Our  software  proved  to 
be  very  robust  as  well  as  fast  The  robustness  is  due  to  an  elaborate  scheme  used  to 
update  the  Lagrangean  multipliers  and  the  speed  is  due  to  the  fine  code  used  to  solve 
the  pure  assignment  problems.  We  also  present  a  modification  of  the  algorithm  for  the 
case  in  which  the  number  of  jobs  exceeds  the  number  of  men  along  with  an  empirical 
analysis  of  ihe  modified  software. 
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I.  INTRODUCnON 


The  singly  constrained  assignment  problem  is  to  determine  a  least  cost  as¬ 
signment  of  n  men  to  n  jobs  such  that  a  single  additional  constraint  is  satisfied. 
This  model  is  a  special  case  of  a  binary  linear  program  and  may  be  stated  mathe¬ 
matically  as  follows: 


minimize  ^  CijXy 

(1) 

(i.j)  e  E 

subject  to  ^  Xy 

=  1  ,  i  =  1,  ...,n 

(2) 

j:  (ij)  e  E 

I  ^ij 

=  1  ,  j  »  1,  ...,n 

(3) 

i:  (i.j)  e  E 

Xy 

G  {0,1}  ,  all  (i,j)  GE 

(4) 

X  aij^ij 

£  b 

(5) 

(i.  j)  e  E 


where  Cy  denotes  the  cost  for  assigning  man  i  to  job  j,  ay  denotes  the  coefficient  of 
Xij  in  the  side  constraint,  b  denotes  the  right-hand-side  of  the  side  constraint,  E  is 

the  set  of  (man,  job)  pairs  corresponding  to  eligible  assignments,  and  Xij  =  1  im¬ 
plies  that  man  i  is  assigned  to  job  j.  In  order  to  simplify  the  notation  we  let  a 
denote  the  vector  corresponding  to  the  coefficients  in  (5),  x  denote  the  vector  cor¬ 
responding  to  the  binary  decision  variables,  c  denote  the  vector  of  costs,  and 
T=  {  X  :  (2),  (3),  and  (4) }.  Then  the  singly  constrained  assignment  problem  can  be 


stated  as  P,  =  min  {cx  :  x  e  T  and  ax  <  b},  and  it  is  well-known  that  P,  is 
NP-complete. 

The  singly  constrained  assignment  was  first  used  by  Brans,  Leclercq,  and 
Hansen  [1973]  to  model  the  core  management  of  a  pressurized  water  reactor.  The 
problem  is  given  two  sets  of  fresh  and  exposed  assemblies  determine  the  location 
pattern  of  these  assemblies  which  maximizes  the  reactivity  of  the  core  under  a 
constraint  on  power-distribution  form  factor.  After  linearization,  Brans,  Leclercq, 
and  Hansen  reduce  the  problem  to  a  sequence  of  singly  constrained  assignment 
problems  and  propose  an  implicit  enumeration  routine  to  solve  these  problems. 
Our  work  on  P,  was  motivated  by  models  which  had  been  developed  by  analysts  at 
the  Navy  Personnel  Research  and  Development  Center  in  San  Diego.  These  models 
involve  the  optimal  assignment  of  men  to  jobs  under  a  budget  constraint  related  to 
relocation  cost. 

The  first  specialized  algorithm  for  P,  was  presented  by  Gupta  and  Sharma 
[1981].  Their  method  was  a  straight  forward  enumeration  scheme  and  they  present 
no  computational  results.  Aggarwal  [1985]  presents  an  improved  algorithm  for  P, 
which  combines  Lagrangean-relaxation  with  the  enumeration  algorithm  of  Gupta 
and  Sharma  [1981]  to  obtain  an  optimal  solution.  No  computational  results  for  this 
method  is  presented.  Mazzola  and  Neebe  [1985]  develop  a  branch-and-bound  al¬ 
gorithm  for  the  constrained  assignment  problem.  To  generate  solutions  at  each 
node  of  the  branch-and-bound  tree  they  developed  a  method  that  combines  a 
restricted  basis  pivoting  rule  followed  by  a  subgradient  routine.  Their  empirical 
evaluation  of  the  heuristic  and  the  branch-and-bound  algorithm  indicates  that  both 
procedures  are  satisfactory  for  dense  assignment  problems  of  size  up  to  100x100. 
Bryson  [1991]  presents  an  algorithm  based  on  the  parametric  programming  proce¬ 
dure  of  Gass  and  Saaty  [1955].  The  largest  problem  they  solved  had  fewer  than 
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2000  edges  and  did  not  exploit  the  network  structure  of  this  model.  This  is  in 
contrast  to  our  empirical  investigation  in  which  the  small  problems  have  over  a 
quarter  of  a  million  edges.  Ball,  Derigs,  Hilbrand,  and  Metz  [1990]  present  an 

algorithm  which  will  solve  the  special  case  of  P,  in  which  aij  c  {0,1}  for  all  (i,j). 

The  work  by  Klingman  and  Russell  [1975]  and  Barr,  Farhangian,  and  Ken- 
nington  [1986]  is  for  the  continuous  version  of  Pi  rather  than  the  binary  version. 
That  is,  the  above  work  would  be  applicable  for  the  model  in  which  (4)  is  replaced 

with  the  nonnegativity  constraint  Xy  >  0,  all  (i,j)  e  E. 

Klingman  and  Russell  [1978]  developed  a  simplex  based  method  for  the 
transportation  problem  with  a  single  side  constraint  and  Glover,  Karney,  Klingman, 
and  Russell  [1978]  developed  a  simplex  based  method  for  the  transshipment  prob¬ 
lem  with  a  single  side  constraint.  Authors  of  both  papers  state  that  codes  based  on 
their  procedures  are  significantly  faster  than  the  UP  code  APEX-in  and  they  both 
obtain  an  integer  solution  for  the  problem  with  an  inequality  side  constraint  by 
pivoting  into  the  basis  the  slack  variable  associated  with  the  side  constraint.  This 
yields  a  triangular  basis  which  automatically  produces  an  integer  solution.  Empiri¬ 
cally  these  integer  solutions  were  found  to  be  within  1%  of  optimality. 

Since  (l)-(5)  is  a  binary  linear  program,  all  the  literature  on  integer  pro¬ 
gramming  applies  (see  Everett  [1963],  Geoffrion  [1969,  1974],  Salkin  [1974], 
Shapiro  [1971,  1979],  Parker  and  Rardin  [1988],  Nemhauser  and  Wolsey  [1988]). 
In  practice  most  integer  programming  models  are  either  solved  as  a  linear  program 
and  the  solutions  are  rounded  using  some  heuristic  or  branch-and-bound  is  used 
in  an  attempt  to  obtain  a  solution  within  a  prespecified  tolerance. 

The  objective  of  this  study  is  to  present  a  new  algorithm  for  the  singly  con¬ 
strained  assignment  problem.  The  algorithm  is  for  the  problem  having  an  inequal- 


ity  side  constraint.  We  also  show  how  this  algorithm  can  be  used  to  solve  problems 
in  which  (5)  is  an  equality  and  problems  in  which  (3)  is  replaced  with 

2]  ;Cij  <  1  ,  j  =  l,...,m.  (6) 

i  ;  (i.j)  e  E 

The  algorithm  uses  a  Lagrangean  relaxation  and  solves  a  series  of  assignment 
problems.  Empirical  results  demonstrate  the  superiority  of  this  approach  over  com¬ 
peting  software.  Problems  having  one -half  million  arcs  were  solved  in  less  than 
fifteen  minutes  on  a  Sequent  Symmetry  S81  using  a  single  processor. 


II.  THE  ALGORITHM 


i 


In  this  section  we  present  a  heuristic  algorithm  for  the  singly  constrained 
assignment  problem,  P,  =  min  {  cx  :  x  €  T,  ax  <  b  }.  Let  Pa  =  min  {  cx  :  x  €  T  } 
be  a  feasible  region  relaxation  of  Pi,  and  let  P3  =  min  {  ax-b  :  x  e  T  }.  By 
dualizing  the  side  constraint  one  obtains  a  Lagrangean  relaxation  of  Pi  given  by 

=  min  {  cx  +  y9(ax-b)  :  x  €  T  }  where  yS  is  the  Lagrangean  multiplier.  Let 
v[P]  denote  the  optimal  objective  function  value  for  any  problem  P,  then  a 

Lagrangean  dual  for  Pi  is  Dh  =  max  {  v[P(y8)]  :  ^  >  0  }. 

We  attempt  to  solve  the  Lagrangean  dual,  Dl,,  by  solving  the  problems  Pa, 

P3  and  a  series  of  F(P)  for  different  values  of  Pa  is  solved  to  obtain  the  in¬ 
itial  lower  bound,  lb,  and  to  determine  if  the  side  constraint  is  redundant.  The 
solution  to  P3  either  establishes  that  Pi  has  no  feasible  solution  or  provides  an 

initial  upper  bound,  ub.  Solving  the  Lagrangean  relaxation,  always  pro¬ 
vides  a  lower  bound  and  if  the  optimal  solution,  is  feasible  for  P,,  then  cx^ 
is  an  upper  bound. 

It  is  well  known  that  v[P03)]  is  a  piece-wise  linear  concave  function  over 

R"^.  Let  xfl  denote  an  optimum  for  P(/S)  at  any  point  ^ .  Let  ^  denote  an  opti¬ 
mum  for  Dl,.  The  optimum  for  Dh  may  be  unique  as  illustrated  in  Figure  1  or 
may  have  an  infinite  number  of  solutions  as  illustrated  in  Figure  2.  For  the  case 

illustrated  in  Figure  1,  for  all  ax^  <  b  and  xp  is  feasible  for  Pi,  and  for 

all  ^  <  jS*,  ax^  >  b  and  xp  is  not  feasible  for  Pi.  For  the  case  illustrated  in  Fig¬ 
ure  2,  for  all  yS  >  /S’,  either  ax^  <  b  and  xp  is  feasible  for  Pi  or  ax^j  =  b  and  xp 
is  an  optimum  for  P,.  Similarly  for  all  ^  *  either  ax^  >  b  and  xp  is  not  fea¬ 

sible  for  Pi  or  zxp  =  b  and  xp  is  an  optimum  for  P,. 
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Let  tt  and  v  denote  upper  and  lower  bounds,  respectively  on  Then  a 
bisection  search  can  be  used  to  obtain  Since  obtaining  each  value  of  v[P(/3)] 

for  a  given  requires  solving  an  assignment  problem,  we  attempt  to  obtain  a 
small  interval  of  uncertainty  [u,v]  prior  to  initiating  the  bisection  search.  It  is 
well-known  that  if  the  duality  gap  is  zero,  then  v[Dli]=v[P,].  That  is, 


CX0‘  ■¥■  /9’(ax^  -  b)  =  v[Pi]  ,  or 

(7) 

=  (v[Pi]  -  cx^)/(ax^  -b). 

(8) 

Prior  to  using  the  bisection  search,  we  obtain  estimates  for  /S’  using  (8)  with 
v[Pi]  replaced  by  (lb+ub)/2,  cx^  replaced  by  ub,  and  ax^*  replaced  by  axj.  That 
is,  initial  values  of  /3  are  given  by: 

^^{ub-lb)/{2(zx^-b)).  (9) 

The  basic  strategy  of  a  heuristic  algorithm  for  P,  follows: 
step  1.  find  an  initial  lower  bound, 
step  2.  find  an  initial  upper  bound, 

while  ax^  <  b  and  stopping  criteria  not  satisfied  repeat  steps  3  and  4, 

step  3.  use  (8)  to  estimate  then  solve  P(/3), 
step  4.  update  u  and  ub  and  if  possible  v  and  lb, 
while  stopping  criteria  is  not  satisfied  repeat  steps  5-7, 

step  5.  let  P  =  (m+v)/2  and  solve  P(/S), 

step  6.  if  zxp  <  b  update  u,  ub,  and  if  possible  lb, 

step  7.  if  ax^  >  b  update  v  and  if  possible  ub  and  lb. 


The  algorithm  terminates  if  one  of  the  following  conditions  is  satisfied: 

(i)  ax^=b,  since  by  strong  Lagrangean  duality  xfi  is  an  optimum. 

(ii)  ub-lb  <  .1  lb,  since  for  Navy  Personnel  Research  and  Development 
Center  models,  an  assignment  that  is  guaremteed  to  be  within  10% 
of  an  optimum  is  considered  to  be  acceptable. 

(iii)  iter  *  maxiter,  since  we  cannot  guarantee  that  criterion  (i)  or  (ii) 
will  ever  be  satisfied,  we  also  stop  after  solving  a  prespecified 
number  of  assignment  problems. 

The  ASSIGN+1  algorithm  for  P,  is  described  below: 

Input: 

1.  The  cost  vector,  c. 

2.  The  side  constraint,  vector  a  and  constant  b. 

3.  The  set  of  (man,  job)  pairs  corresponding  to  eligible  assignments,  E. 

4.  The  maximum  number  of  iterations  permitted  before  termination,  maxiter. 
Output: 

1.  The  solution  vector,  y. 

2.  A  lower  bound  for  P,,  lb. 

3.  The  objective  value  corresponding  to  y,  ub. 

4.  The  termination  status.  If  P,  has  no  feasible  solution,  then 

status  -  infeasible;  if  an  optimal  solution  was  obtained  then  status^optimal, 
otherwise,  status  -  feasible. 


algorithm  ASSIGN+1: 
begin 

iter:=0,  v:=0; 

FIND  INITIAL  LOWER  BOUND; 

FIND  INITIAL  UPPER  BOUND; 

while  7>0  and  jter<maxiter  do  FIND  INITIAL  U; 

while  v*0,  7<0,  \ub-Ib\>.l\lb\,  and  iter<niaxiter  do  REDUCE  U; 

while  |u6-/6|>.l|lbl  and  iter<maxiter  do  BISECTION; 

if  u>fi,  then  u  and  let  Xfi  be  an  optimum  for  P(;S); 
y:= 

end; 

procedure  FIND  INITIAL  LOWER  BOUND 
begin 

let  X2  be  an  optimum  for  P2.  iter:=  iter+1,  lb:=  CX2; 
if  ax2  <  b,  then  status:^  optimal,  y:=  X2,  ub:=  cy,  stop; 

end; 

procedure  FIND  INITIAL  UPPER  BOUND 
begin 

let  X3  be  an  optimum  for  P3,  iter:=  iter+1,  8:=  ax3-b; 
if  8>0,  then  status:-  infeasible,  stop; 

else  status:=^  feasible,  ub:^  cxa,  7:=  -1,  ilb-ub)l(2hy. 


Procedure  FIND  INITIAL  U 
begin 

let  xp  be  an  optimum  for  P(^),  iter:=‘  iter+l; 

7;=  ax/j  -b,  /6:=  max{/i>,v[P(^)l}; 
if  7=0,  then  status:^  optimal,  y:=  xp ,  ub:=  cy,  stop; 
if  7<0  then  ub:=  min{w6.  cx;?},  u:= 
else  v;=max{v,^}  ,  ^:=  2^; 

end; 

procedure  REDUCE  U 
begin 

if  ^=(/6-u6)/(28).  then  ^:=^/2; 
else  fi:=  (/i>-«6)/(28); 

let  x^  be  an  optimum  for  P(^),  iter:=^  /ter+1; 

7:=  ax^-b,  /6;=  max{/6,v[P(/S)]}; 
if  7=0,  then  status:^  optimal,  y:=  xp ,  ub:=  cy,  stop; 
if  7<0,  then  min  {uh,cx^},  u:=  min  {u,;3}; 
else  v:=  fi", 

end; 
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procedure  BISECHON 
begin 

yS:=  (m+v)/2; 

let  X(t  be  an  optimum  for  P(/S),  iter:*  »er+l; 

7:*  ax^-b,  lb:=  max{/d,v[P(^)]}; 

f 

if  7*0,  then  status:^  optimal,  y:=  xp ,  ub:=  cy,  and  stop; 
if  7<0,  then  iih:=  min{uh,  cx^  },  and  u:=  min{u,^}; 
else  v:*  max{v,;S}; 

end; 

It  is  well-known  that  the  main  difficulty  with  the  Lagrangean  approach  is  the 

selection  of  the  sequence  of  multipliers,  so  that  software  implementations  using 
these  rules  are  robust.  Convergence  results  can  be  found  in  Allen,  Helgason,  Ken- 
nington,  and  Shetty  [1987].  Most  of  the  steps  in  the  above  algorithm  are  related  to 

the  elaborate  scheme  for  updating  ^  which  was  developed  through  empirical  analy- 


in.  AN  EQUALITY  SIDE  CONSTRAINT 
In  this  section  we  show  that  the  ASSIGN+1  algorithm  is  also  applicable  to  the 
assignment  problem  with  an  equality  side  constraint,  P'l  =  {  min  cx  :  x  €  T,  ax=b  }. 
That  is,  we  show  that  if  P'l  is  a  feasible  problem,  then  solving  P’j  is  equivalent  to 
solving  either  Pi  or  P"i  =  {  min  cx  :  x  €  T,  ax  >  b  }  or  Pj. 

Proposition  1  Let  P'l  be  a  feasible  problem,  and  let  Xj  be  an  optimum  for  Pj. 
If  ax2  <  b,  then  vfP'J  =  v[P"J. 

Proof  Let  F(P)  denote  the  feasible  region  for  any  problem  P,  x”,  be  an  optimum 
for  P''i  and  suppose  vfP'J  5^v[P''i].  Since  v[P',]  516  v[P''i],  and  F(P'i)c  F(P'’0, 
then  v[P',]  >  v[P"i].  Therefore  ax"i>  b-  Let  x''i+X.,d,  with  0  <  X.i  <  1  and  d,  = 
Xa-  x''i  be  the  line  segment  with  end  points  Xj  and  x”, .  This  line  segment  can  also 
be  written  as  X2+X2d2  with  0  <  X2  <  1  and  da  =  x'',-X2.  Note  that  da  is  a  feasible 
direction  for  Pa  at  the  point  Xa,  since  Xa  6  F(Pa)  and  x'’ie  FOPa).  Let  D(jg(y)denote 
the  directional  derivative  of  the  function  g(x)  in  the  direction  d  at  the  point  y. 
Letting  g(A:)  be  the  linear  function  cx,  we  know  that  Dd,g(x"i)=  -Dd^gCx:)  Since 
ax”,  >  b  and  axa  <  b,  there  exists  a  X,such  that  x  =  x”,+Aidi  and  ax=b.  Since  x  e 
F(P'i)  and  v(P,]  >  v[P”i],  cx  >  cx",  .  Likewise,  cx  >  cxa.  Hence  Dd,g(x)  <  0  and 

DdjgCjO  <  0,  a  contradiction.  Hence  v[P',]=  v[P”.].  I 

Proposition  2  Let  P',  be  a  feasible  problem,  and  let  Xa  be  an  optimum  for  Pa- 
If  axa  >  b,  then  v[P',]  =  v[P,]. 

Proof  Similar  to  the  proof  for  the  Proposition  1. 

Proposition  3  Let  Xa  be  an  optimum  for  Pa.  If  axa  =  b,  then  Xa  solves  P',. 
Proof  Let  F(P)  denote  the  feasible  region  for  any  problem  P,  F(P',)cF(Pa)  and 
Xa  6  F(P,),  then  Xa  is  an  optimum  for  P',.  I 
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As  Propositions  1  and  2  indicate,  the  equality  problem  can  be  solved  by 
®  solving  an  inequality  problem.  The  solution  to  Pj  indicates  whether  one  needs  to 

solve  the  problem  with  ax  <  b  or  ax  >  b.  Since  finding  a  set  of  assignments  for 
which  ax  =  b  may  not  always  be  possible  and  since  from  our  work  with  the  Navy 
®  Personnel  Research  and  Development  Center  we  have  found  that  most  constraints 

can  be  slightly  violated  and  acceptable  solutions  can  still  be  obtained,  we  replace 
ax-b  =  0  with  |ax-b|  <  €.  That  is,  instead  of  P'l  we  attempt  to  solve  min  {  cx  :  x  e 
•  T  and  |ax-b|  <  e  }.  For  all  our  work  we  set  €  to  .01b. 
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IV.  EMPIRICAL  ANALYSIS 

The  algorithm  ASSIGN+1  has  been  implemented  in  software  and  empirically 
analyzed  on  both  a  Sequent  Symmetry  S81  and  an  IBM  3081D  for  both  inequality 
and  equality  side  constraints.  Both  codes  are  written  in  FORTRAN  and  use  SEMI 
(see  Kennington  and  Wang  [1990a,  1990b])  to  solve  the  assignment  problems. 
SEMI  is  an  implementation  of  the  shortest  augmenting  path  algorithm  for  sparse 
semi-assignment  problems  and  is  claimed  to  be  one  of  the  fastest  codes  available 
for  both  assignment  and  semi-assignment  problems. 

We  developed  a  test  problem  generator  with  the  following  inputs;  (i)  the 

number  of  men,  (ii)  the  arc  density,  (iii)  the  maximum  cost,  c,  and  (iv)  the  side 
constraint  multiplier,  k.  Both  the  costs  and  the  side  constraint  coefficients  are  uni¬ 
formly  distributed  over  the  range  [0  -  £].  We  randomly  generate  a  feasible  assign¬ 
ment,  x,  and  determine  b  for  this  assignment  so  that  ax  =  b.  The  right-hand-side, 

b,  for  the  side  constraint  is  set  to  kb.  For  the  inequality  problems  and  k  =  1,  we 
observed  that  for  most  problems,  the  side  constraint  was  redundant  and  therefore  a 
very  easy  problem.  As  k  becomes  smaller,  the  feasible  region  becomes  smaller  and 
for  sufficiently  small  k  the  problem  may  become  infeasible. 

The  generator  was  used  to  generate  the  eighty-one  inequality  problems  de¬ 
scribed  in  Table  1.  Under  the  column  entitled  “Side  Const”,  k  was  set  to  .2,  .5,  and 
.9  for  the  rows  entitled  “small”,  "medium”,  and  “large”,  respectively.  It  should  be 
noted  that  the  software  is  very  robust  as  a  function  of  the  magnitude  of  b  and  it 
requires  very  few  iterations  to  satisfy  the  optimality  criteria.  Near  optimal  solutions 
to  integer  programs  with  over  one-half  million  binary  variables  were  routinely 
obtained  in  less  than  fifteen  minutes. 


Table  2  gives  our  empirical  results  with  135  equality  problems.  Under  the 
column  entitled  “Side  Const”,  k  was  set  to  .2,  .5,  .9,  1.2,  and  1.5  for  the  rows 
entitled  “very  small”,  “small”,  “medium”,  “large”,  and  “very  large”,  respectively. 
The  software  is  very  robust  over  a  wide  range  of  input  parameters  and  performed 
very  well  on  all  these  problems. 

In  contrast  with  some  of  our  previous  experience  using  subgradient  optimiza¬ 
tion,  we  never  found  a  problem  that  caused  this  software  any  major  difficulty.  We 
attribute  the  robustness  of  this  software  to  the  elaborate  scheme  for  updating  the 
Lagrangean  multiplier  which  works  well  for  this  class  of  problems.  We  attribute 
the  speed  of  this  software  to  the  semi-assignment  software,  SEMI.  Of  course,  the 
version  of  SEMI  that  we  used  was  modified  to  handle  real  cost  as  opposed  to 
integer  data  required  by  the  version  described  in  Kennington  and  Wang  [1990a]. 

Tables  3  through  6  present  our  empirical  results  comparing  the  specialized 
software  for  the  singly  constrained  assignment  problem  with  both  MPSX  (see 
Mathematical  Programming  System  Extended  [1979])  and  OSL  (see  Optimization 
Subroutine  Library  [1990]).  If  the  ASSIGN+1  software  for  the  equality  side  con¬ 
straint  terminates  due  to  the  maximum  number  of  iterations  and  no  feasible  solu¬ 
tion  has  been  found,  then  the  current  best  known  Lagrangean  multiplier  is  used  to 
find  a  solution.  In  this  case  the  side  constraint  violation  will  exceed  1%.  This  oc¬ 
curred  for  four  of  sixteen  problems  presented  in  Tables  3-6.  In  the  worst  case,  the 
maximum  deviation  from  feasibility  was  1.65%.  That  is,  all  solutions  satisfied  the 
constraint  |ax-b|  <  0.0165b. 

All  the  MPSX  and  OSL  runs  were  made  with  default  parameter  settings.  A 
few  of  the  smaller  problems  were  successfully  solved,  but  the  times  were  from  two 
to  three  orders  of  magnitude  slower  than  those  for  the  specialized  software.  In 
addition  we  ran  a  specialized  network  with  side  constraint  code,  NETSIDE  (see 


Kennington  and  Whisman  [1990])  on  an  800x800  test  problem  in  an  attempt  to 
solve  the  linear  programming  relaxation  of  this  model.  Convergence  was  not 
achieved  after  two  hours  of  cpu  time  on  the  Sequent  Symmetry  S81. 

As  stated  we  have  modified  SEMI  to  handle  real  cost  coefficients  as  opposed 
to  the  integer  cost  coefficients.  Generally  this  modification  is  expected  to  increase 
the  execution  time  drastically,  but  in  this  case,  as  Table  7  indicates  this  increase 
was  less  than  ten  percent.  Results  presented  in  Table  7  are  the  average  wall  clock 
times  for  three  pure  assignment  problems,  running  on  a  Sequent  Symmetry  S81 
using  the  floating  point  accelerator. 


Table  1.  The  assignment  problem  with  an  inequality  side  constraint 


Problem 

Cost/Const 

Side  Const 

#  of 

Time’ 

Solution  Guaranteed 

Size 

Range 

RHS 

Iter 

(min) 

within  %  of  Opt. 

small 

7.33 

95.16 

0-1000 

medium 

7.33 

2.98 

93.28 

laree 

mm 

2.48 

97.88 

small 

8.33 

3.56 

92.81 

800x800 

0-10000 

medium 

7.33 

3.04 

94.10 

(256,000  arcs) 

large 

6.00 

2.67 

98.26 

small 

8.33 

92.56 

0-100000 

medium 

7.33 

94.14 

large 

6.00 

2.61 

98.32 

small 

6.00 

4.26 

95.17 

0-1000 

medium 

8.33 

6.43 

92.39 

large 

6.00 

4.10 

97.34 

small 

4.42 

1000x1000 

0-10000 

medium 

7.67 

5.56 

(400,000  arcs) 

large 

4.29 

97.21 

small 

w^sm 

4.51 

94.60 

0-100000 

medium 

7.67 

5.72 

93.60 

large 

97.33 

small 

6.67 

7.70 

94.42 

0-1000 

medium 

8.33 

10.00 

92.35 

large 

5.00 

5.36 

99.06 

small 

6.67 

8.18 

93.69 

1200x1200 

0-10000 

medium 

8.33 

10.36 

95.14 

(576,000  arcs) 

large 

6.00 

7.09 

96.87 

small 

6.67 

8.31 

93.83 

0-100000 

medium 

8.33 

10.43 

95.06 

large 

6.00 

6.99 

96.98 

1 


Times  are  wall  clock  time  on  a  Sequent  Symmetry  S81  using  one  processor. 


Table  2.  The  assignment  problem  with  an  equality  side  constraint 


Problem 

Side  Const 

#  of 

Time’ 

Solution  Guaranteed 

Size 

ESSSHl 

RHS 

Iter 

(min) 

within  %  of  Opt. 

very  large 

8.67 

4.06 

99.15 

large 

4.94 

99.64 

0-1000 

medium 

8.67 

4.08 

99.80 

small 

8.67 

4.14 

99.33 

very  small 

7.67 

3.45 

99.44 

very  large 

iWW 

99.39 

large 

99.76 

800x800 

0-10000 

medium 

9.00 

99.96 

(256,000  arcs) 

small 

9.00 

4.26 

100.00 

very  small 

8.67 

4.27 

99.93 

very  large 

8.67 

4.16 

99.64 

large 

10.67 

5.04 

99.76 

0-100000 

medium 

8.67 

4.15 

99.91 

small 

9.33 

4.42 

99.95 

very  small 

10.00 

4.53 

99.95 

very  large 

9.67 

7.82 

99.91 

large 

10.33 

8.32 

99.76 

0-1000 

medium 

8.00 

6.00 

99.84 

small 

10.33 

8.07 

99.79 

very  small 

9.00 

7.40 

100.00 

very  large 

8.44 

99.61 

large 

■OH 

8.49 

99.34 

1000x1000 

0-10000 

medium 

7.00 

5.36 

99.82 

(400,000  arcs) 

small 

10.67 

9.06 

99.53 

very  small 

10.67 

9.56 

100.00 

very  large 

8.41 

99.56 

large 

8.50 

99.78 

0-100000 

medium 

6.40 

99.77 

small 

8.78 

99.44 

very  small 

10.30 

9.04 

99.87 

very  large 

10.00 

11.93 

98.55 

large 

9.00 

10.39 

99.87 

0-1000 

medium 

9.33 

11.05 

99.99 

small 

10.33 

13.02 

99.43 

very  small 

9.33 

11.69 

99.74 

very  large 

9.67 

mm 

98.40 

large 

11.00 

ISQH 

99.82 

1200x1200 

0-10000 

medium 

8.67 

99.92 

(576,000  arcs) 

small 

10.00 

99.65 

very  small 

11.00 

mm 

99.71 

very  large 

12.12 

98.81 

large 

13.48 

99.54 

0-100000 

medium 

8.67 

10.34 

99.92 

small 

9.33 

11.72 

99.76 

very  small 

11.00 

14.45 

99.73 

'  Times  are  wall  clock  time  on  a  Sequent  Symmetry  S81  using  one  processor. 
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Table  3.  Comparisons  of  specialized  algorithms  with  MPSX  and  OSL  on  an  IBM  308 ID  (very  small  value  of  b  in  side  constraint) 
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Table  4.  Comparisons  of  specialized  algorithms  with  MPSX  and  OSL  on  an  IBM  308 ID  (small  value  of  b  in  side  constraint) 
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Table  5.  Comparisons  of  specialized  algorithms  with  MPSX  and  OSL  on  an  IBM  308  ID  (large  value  of  b  in  side  constraint) 
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Table  6.  Comparisons  of  specialized  algorithms  with  MPSX  and  OSL  on  an  IBM  308 ID  (very  large  value  of  b  in  side  constraint) 
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Table  7.  Comparison  of  the  integer  and  real  versions  of  pure  assignment  codes. 
(The  Weitek  floating  point  accelerator  was  activated  in  all  runs.) 


Problem  Size 

400x400 

800x800 

1000x1000 

1200x1200 

SEMI  (Secs.) 

(integer) 

3.66 

14.93 

26.36 

37.26 

ASSIGN+1  (Secs.) 
(floating  point) 

4.00 

/ 

16.38 

28.37 

40.71 

Increase  for  floating 
point  arithmetic 

9.56% 

9.71% 

7.62% 

9.25% 
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V.  THE  SINGLY  CONSTRAINED  UNBALANCED  ASSIGNMENT  PROBLEM 

Navy  personnel  assignment  problems  are  unbalanced  in  which  the  number  of 
jobs  n  exceeds  the  number  of  men  n.  After  dualizing  the  side  constraint  we  obtain 
an  unbalanced  pure  assignment  problem  whose  dual  is 


maximize  2 

(10) 

i  j 

Cjj  ~  Aj  —  JTj  s  0 , 

(i.j)  e  E 

(11) 

JTj  <0,  j  = 

1.  ...,m. 

(12) 

The  dual  variable,  tT],  is  associated  with  job  j  and  the  dual  variable,  A,-,  is  associated 
with  the  man  i.  The  dual  problem  for  the  balanced  assignment  problem,  (l)-(4)  is 
(10)  and  (11). 

SEMI,  the  FORTRAN  code  used  to  solve  the  pure  assignment  problems  in  the 
previous  sections  of  this  paper  is  the  software  implementation  of  a  shortest  aug¬ 
menting  path  algorithm  developed  by  Kennington  and  Wang  [1990a].  The  algo¬ 
rithm  is  a  dual  method  and  consists  of  four  phases:  column  reduction,  reduction 
transfer,  row  reduction  augmentation,  and  shonest  path  augmentation.  In  each 
phase  both  dual  feasibility,  Cij->li-jrj  >  0  for  all  (i,j)  €  E  and  complementary 
slackness  Xij(Cij  - /Ij  -  ;Tj)  =  0  for  all  (i,j)  e  E  are  maintained  and  the  procedure 
works  toward  obtaining  primal  feasibility,  (2)  and  (3).  Minor  modifications  to 
SEMI  were  incorporated  so  that  jtj  <  0  for  all  j  was  also  maintained  throughout  the 
four  phases.  The  modified  code  is  called  UNBAL_SEMI. 
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Table  8  presents  our  empirical  results  comparing  SEMI  and  UNBAL_SEMI 
for  the  assignment  problem  and  presents  results  for  the  singly  constrained  unbal¬ 
anced  assignment  problem.  Test  runs  were  performed  on  an  IBM  308 ID  and  a 
Sequent  Symmetry  S81.  Every  entry  in  columns  2-6  of  Table  8  is  the  average  run 
time  for  three  randomly  generated  problems  except  for  the  entries  in  the  last  row 
which  are  for  a  singly  constrained  unbalanced  assignment  problem  provided  by  the 
Navy  Personnel  Research  and  Development  Center  in  San  Diego. 

By  adding  dummy  nodes  and  anificial  arcs,  one  can  always  convert  an  unbal¬ 
anced  problem  to  a  balanced  one.  As  shown  in  Table  8,  the  specialized  code  for 
the  unbalanced  problem  can  run  four  times  faster  than  the  corresponding  balanced 
code.  For  the  400x600  assignment  problems,  UNBAL_SEMI  was  four  times  faster 
than  SEMI  on  problems  in  which  200  dummy  men  and  120,000  dummy  arcs  were 
appended.  We  also  find  that  for  this  application,  the  IBM  308 ID  is  approximately 
twice  as  fast  as  the  Sequent  Symmetry  S81. 


Table  8.  CPU  times  (sec.)  on  an  IBM  3081D  and  wall  clock  times  (sec.)  on  a 
Sequent  Symmetry  S81  for  balanced  and  unbalanced  pure  assignment 
problems  and  singly  constrained  unbalanced  assignment  problems. 


Size 
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Assignment-fl  Side  Constraint 

SEMI’ 

UNBAL_ASSIGN+1 

IBM 

100x100 

0.11 

0.19 

g 

0.28 

1.89 

3.31 

100x200 

0.44 

1.28 

0.16 

1.39 

2.59 

100x300 

1.79 

3.70 

D 

0.21 

2.21 

4.17 

200x200 

0.44 

0.69 

0.63 

1.18 

6.84 

9.51 

200x300 

1.07 

2.16 

0.22 

0.49 

4.76 

8.83 

200x400 

2.36 

5.13 

0.26 

0.59 

5.86 

11.39 

300x300 

1.22 

2.14 

1.74 

3.37 

15.76 

32.78 

300x400 

1.70 

3.35 

0.52 

1.15 

9.92 

18.90 

300x500 

3.22 

6.91 

0.53 

1.18 

10.84 

25.46 

400x400 

2.30 

4.18 

3.78 

7.86 

37.83 

74.09 

400x500 

2.23 

4.63 

0.88 

1.92 

16.02 

31.86 

400x600 

3.68 

8.80 

0.90 

2.00 

16.90 

35.72 

98x3622 

NA 

NA 

NA 

NA 

0.28 

0.59 

Total 

20.56 

43.16 

10.04 

20.39 

130.50 

258.74 

1  dummy  men  nodes  and  artificial  arcs  are  added  to  balance  men  and  jobs. 

2  real  problem  provided  by  the  Navy. 


VI.  SUMMARY  AND  CONCLUSIONS 
We  have  presented  a  new  algorithm,  ASSIGN+1,  for  the  singly  constrained 
assignment  problem.  This  algorithm  is  applicable  for  balanced  and  unbalanced 
assignment  problems  having  either  an  inequality  or  an  equality  side  constraint.  The 
algorithm  uses  Lagrangean  relaxation  and  solves  a  series  of  pure  sparse  assign¬ 
ment  problems. 

The  empirical  results  of  test  runs  of  the  FORTRAN  implementation  of  the 
algorithm  for  balanced  problems  with  inequality  and  equality  side  constraints  and 
unbalanced  problems  with  an  inequality  side  constraint  indicate  that  these  three 
codes  are  very  robust  and  need  very  few  iterations  to  satisfy  the  optimality  criteria. 
Remarkably,  near  optimal  solutions  to  integer  programs  with  over  one-half  million 
binary  variables  are  obtained  in  less  than  fifteen  minutes  on  a  Sequent  Symmetry 
S81  using  a  single  processor.  The  results  from  test  runs  of  ASSIGN+1,  MPSX,  and 
OSL  proves  the  superiority  of  the  new  algorithms  over  state-of-the-art  general 
purpose  software. 
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The  objective  of  this  research  was  to  develop  and  empirically  test  new 
simplex'based  parallel  algorithms  for  the  generalized  network  optimization 
problem.  One  of  these  algorithms  is  essentially  a  “data  parallel”  method 
in  which  each  processor  executes  identical  code  on  a  portion  of  the  data. 
(However,  since  the  data  sets  are  not  necessarily  disjoint,  “locks”  are  used 
to  ensure  exclusive  access.)  A  second  algorithm  exhibits  “control  parallelism”, 
using  different  processors  to  simultaneously  execute  the  different  subtasks  of 
the  simplex  method.  “Locks”  are  not  needed  in  this  second  approach,  but, 
instead,  at  the  beginning  of  each  pivot,  an  “audit”  of  the  proposed  entering 
arc  is  performed  in  order  to  ensure  correctness  of  the  method.  These  par¬ 
allel  algorithms  were  implemented  on  the  Sequent  Symmetry  multiprocessor, 
empirically  tested  on  a  variety  of  problems  produced  by  two  random  prob¬ 
lem  generators,  and  compared  with  two  leading  state-of-the-art  serial  codes. 
Good  speedups  were  obtained  relative  to  the  serial  codes,  and  the  relative 
performance  of  the  two  parallel  methods  was  found  to  be  dependent  on  con¬ 
nectedness  properties  of  the  optimal  solutions  of  the  test  problems.  The  largest 
test  problem,  a  generalized  transportation  problem  having  30,000  nodes  and 
1.2  million  arcs  was  optimized  in  approximately  eleven  minutes  by  our  parallel 
code,  displaying  a  speedup  of  13  on  15  processors. 
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The  generalized  network  flow  problem  (also  czdled  the  flow  with  gains  model)  in  its 
most  general  form  is  defined  as  follows: 

min  cx 

X 

s.t.  Gx  =  b  (GN) 

0  <  X  <  u 

where  G  is  an  m  x  n  matrix  having  at  most  two  nonzero  entries  in  each  column,  c  is  an  n- 
vector  of  costs,  6  is  an  m- vector  of  right-hand-sides,  and  u  is  an  n- vector  of  upper  bounds. 
Associated  with  the  matrix  C?  is  a  generalized  graph  [A”,  A],  where  N  is  a  set  of  nodes  and 
A  is  a  set  consisting  of  pairs  of  nodes  (arcs)  and  (possibly)  singletons  (root  axes).  The 
nodes  correspond  to  the  rows  of  G  and  the  arcs  and  root  arcs  correspond  to  columns  of 
G.  (To  simplify  the  discussion,  we  will  use  the  term  “arcs”  to  refer  to  both  “ordinary” 
axes  (which  connect  two  nodes)  and  root  arcs,  which  axe  incident  to  only  one  node  and 
correspond  to  columns  of  G  with  only  one  non- zero  element.)  As  with  the  pure  network 
flow  problem,  which  we  designate  a^  PN,  the  simplex  algorithm  for  GN  can  be  executed 
on  a  graph.  The  graph  of  a  basis  for  GN  is  a  collection  (forest)  of  one  or  more  quasi-trees, 
where  a  quasi-tree  is  a  tree  with  exactly  one  additional  arc  (making  it  either  a  rooted  tree 
or  a  tree  with  exactly  one  cycle).  This  structural  property  plays  an  important  role  in  one 
of  the  pairallel  algorithms  described  below.  Figure  1  shows  a  forest  of  quasi- trees. 


Figure  1  A  Forest  of  Quasi-IVees 
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1.  Survey  and  Overview 

The  generalized  network  model  can  be  used  to  optimize  network  problems  found  in 
the  areas  of  investment  planning,  job  scheduling,  pure  network  optimization  and  others. 
The  applications  are  characterized  by  networks  for  which  any  arc  may  gain  or  lose  flow  at 
a  linear  rate  assigned  to  that  arc.  Profit  from  interest  or  dividends  C2in  be  modeled  by  a 
network  with  gains,  and  loss  from  evaporation  or  seepage  can  be  modeled  by  a  network 
with  losses.  A  generalized  network  without  gains  or  losses  is  a  pure  network.  Further 
discussion  of  applications  can  be  fotmd  in  Glover  et  al.  [15]  aind  Mulvey  and  Zenios  [23]. 
The  graphical  structure  of  a  basis  for  G  allows  the  use  of  labeling  procedures  for  baisis 
representation.  Glover,  Klingman,  and  Stutz  [16]  developed  the  first  specialized  primal 
simplex  code  (NETG)  which  exploited  this  graphical  structure.  Many  theoretical  and 
computational  improvements  have  been  made  to  this  code  over  the  last  fifteen  years  (see 
Glover  et  al.  [15])  and  Elam  et  al.  [13]).  A  similar  implementation  was  also  developed 
in  Langley  [22].  Adolphson  and  Heum  [1]  presented  computational  results  with  their 
generc  lized  code  which  used  an  extension  of  the  threaded  index  method  of  Glover  et  al.  [17] . 
Brown  and  McBride  presented  the  details  of  their  generalized  network  code  (GENNET) 
in  [5].  Tomlin  [26]  developed  the  first  assembly  language  code  which  is  part  of  Ketron’s 
Mi  S  III  system.  Recently,  other  codes  have  been  developed  by  Engquist  and  Chang  [14]), 
Mulvey  and  Zenios  [23]),  and  by  Ali,  Charnes,  and  Song  [3]).  The  first  parallel  generalized 
code  was  developed  by  Chang,  Enquist,  Finkel,  and  Meyer  [9])  for  the  Wisconsin  CRYST.A.L 
Multicomputer,  zmd  the  second  (see  Clark  and  Meyer  [11])  for  the  Sequent  21000,  also  at 
the  University  of  Wisconsin.  The  first  C  language  code  is  discussed  in  Nulty  and  Trick 
[24].  Another  eissembly  language  code  is  discussed  in  Chang,  et  al,  [7].  The  serial  codes 
GENFLO,  a  modification  of  GENNET,  and  GRNET2  axe  discussed  in  Ramamurti  [25] 
and  Clark  and  Meyer  [12],  respectively.  Computational  results  for  these  two  codes  will 
be  given  in  Section  3.  TPGRNET,  a  parallel  code  that  assigns  distinct  tasks  to  different 
processes  is  discussed  in  Clark  and  Meyer  [12]  and  additional  results  for  this  code  are  given 
in  Section  3.  A  summtiry  of  this  prior  software  may  be  foimd  in  Table  I. 

In  Section  2,  a  brief  discussion  of  some  strategies  for  parallelizing  the  primal  sim¬ 
plex  method  will  be  given,  along  with  detailed  discussions  of  two  codes  PGRNET  and 
TPGRNET.  PGRNET  executes  pivots  and  computes  reduced  costs  in  parallel.  TPGRNET 
computes  reduced  costs  in  parallel  and  overlaps  this  with  the  serial  execution  of  pivots. 
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Table  I  Stirvey  of  generalized  network  codes 


Code 

Language 

Authors 

Year 

NETG 

FORTRAN 

Glover,  F.,  Klingman,  D. 
Stutz,  J. 

1973 

FORTRAN 

Langley,  W. 

1973 

FORTRAN 

Adolphson,  D.,  Heum,  L. 

1981 

GENNET 

FORTRAN 

Brown,  G.,  McBride,  R. 

1984 

GWHIZNET 

ASSEMBLER 

Tomlin,  J. 

1984 

GENET 

FORTRAN 

Engquist,  M.,  Chang,  M. 

1985 

LPNETG 

FORTRAN 

Mulvey,  J.,  Zenios,  S. 

1985 

FORTRAN 

Ali,  I.,  Chames,  A. 
Song,  T. 

1986 

GRNET-K 

(parallel) 

FORTRAN 

Chang,  M.,  Engquist,  M. 
Finkel,  R.,  Meyer,  R. 

1987 

PGRNET 

(parallel) 

FORTRAN 

Clark,  R.,  Meyer,  R. 
Chang,  M. 

1987 

GNO/PC 

C 

Nulty,  W.,  Trick,  M. 

1988 

GRNET-A 

ASSEMBLER 

Chang,  M.,  Cheng,  M. 
Chen,  C. 

1988 

GENFLO 

FORTRAN 

Ramamurti,  M. 

1989 

GRNET2 

(serial) 

FORTRAN 

Clark,  R.,  Meyer,  R. 
Chang,  M. 

1989 

TPGRNET 

(paxeillel) 

FORTRAN 

Clark,  R.,  Meyer,  R. 

1989 

The  test  problems  discussed  in  Section  3  are  generated  by  1)  NETGEN  [21],  a  pure 
network  problem  generator,  2)  GNETGEN,  a  generalized  network  problem  generator  based 
on  NETGEN,  and  3)  MAGEN  [12],  a  generalized  network  problem  generator  based  on 
GTGEN  [8]. 

In  Section  3,  it  is  established  that  the  two  serial  codes  GRNET2  and  GENFLO  axe 
competitive  with  GENNET,  a  state-of-the  art  generalized  network  code  discussed  in  Brown 
and  McBride  [5].  This  comparison  is  made  by  giving  results  for  NETGEN  and  GNETGEN 
problems.  Next,  results  axe  given  for  TPGRNET  for  a  group  of  transshipment  problems 
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generated  by  GNETGEN  with  up  to  6,000  nodes  and  up  to  50,000  arcs.  Results  are  then 
given  for  PGRNET  and  TPGRNET  for  a  group  of  generalized  transportation  problems 
generated  by  MAGEN.  All  of  these  problems  have  30,000  nodes  eind  over  320,000  arcs. 
These  problems  are  much  more  difficult  than  the  GNETGEN  problems  in  terms  of  both 
size  and  the  pivots/nodes  ratio.  In  addition,  the  variation  in  the  granularity  of  the  optimal 
solutions  of  these  problems  allows  a  good  compzirison  to  be  made  of  the  suitability  of  the 
two  parallel  approaches  as  a  function  of  solution  granularity. 


2  SIMPLEX  ALGORITHMS  FOR  GENERALIZED  NETWORKS 


2.1  Serial  Primal  Simplex  Algorithms 

In  this  section  we  briefly  discuss  the  specieilization  of  the  primal  simplex  method  for 
generalized  networks.  A  more  detailed  discussion  can  be  fotind  in  Kennington  and  Helgason 
[20]  and  Jensen  and  Barnes  [19]. 

Input: 

1.  A  generalized  graph  [IV,  A]. 

2.  A  cost  c[a]  and  arc  capacity  u[a\  for  each  arc  a  G  A. 

3.  The  constraint  matrix  G. 

4.  A  right-hand-side  value  6[n]  for  all  n  €  IV. 

Output: 

1.  The  termination  type  indicator  0  and  flow  array  x[a] .  (/?  =  1  implies  that  the  problem 
is  unbounded,  ^  =  2  implies  that  the  problem  has  no  feasible  solution,  and  =  3 
implies  that  the  optimal  solution  is  given  in  x[a].) 

The  primal  simplex  algorithm  for  generalized  networks  can  be  partitioned  into  three 
subroutines,  PRICE,  RATIO  and  UPDATE.  These  correspond  to  the  computation  of  re¬ 
duced  costs,  the  ratio  test,  and  the  basis  update  in  the  simplex  method  for  genered  LP’s. 
Each  of  the  subroutines  makes  use  of  the  block  diagonal  (and  nearly  trianguleu")  nature  of 
the  bases  for  (GN),  as  discussed  in  Adolphson  [2]  and  Benr,  Glover  and  Klingman  [4].  The 
primal  simplex  algorithm  can  be  summarized  as  follows: 

Procedure  SIMPLEX 
begin 

1.  0i-O 

2.  initialize  duals  (tt) 

3.  call  module  PRICE 

4.  if  ^  0,  then  terminate 

5.  call  module  RATIO 

6.  if  9^  0  terminate 

7.  call  module  UPDATE 

8.  goto  3. 


In  module  PRICE,  reduced  costs  are  computed  for  arcs  (variables)  by  using  the  for¬ 
mula  rij  =  (c  —  irG)ij,  where  rij  is  the  reduced  cost  for  arc  (*,  j).  The  expression  (c  — 7rG),j 
has  at  most  three  terms,  since  G  has  at  most  two  non-zero  entries  in  each  column.  The 
hexiristics  employed  by  GRNET2,  PGRNET  and  TPGRNET  to  determine  which  arcs  to 
price  and  to  use  as  pivots  will  be  discussed  later,  amd  involve  maintaining  candidate  lists 
of  pivot  eligible  arcs  in  very  different  ways.  In  fact,  the  parallel  approaches,  because  of 
their  asynchronous  behavior,  produce  pivot  sequences  that  wotild  be  difficult  if  not  im¬ 
possible  to  replicate  in  a  serial  algorithm.  In  module  RATIO,  the  ratio  test  for  a  given 
pivot-eligible  au'c  is  performed  by  identifying  the  incident  quasi-trees  (i.e.  the  incident  baisis 
components)  and  following  the  path  from  each  end  of  the  arc  to  the  (generalized)  root  of 
the  corresponding  quasi-tree(s).  As  this  traversal  is  made,  the  flow  on  the  basic  arcs  in 
the  path  is  checked,  and  the  arc  with  the  “minimum  ratio”  is  selected  as  the  outgoing  arc. 
In  module  UPDATE,  flows  as  well  as  other  tree  data  structures  are  updated. 

GENFLO  and  GRNET2  are  implementations  of  this  zdgorithm,  and  they  will  be  shown 
to  be  comparable  in  speed.  However,  the  two  codes  differ  in  a  few  ways.  GENFLO  is  a 
modiflcation  of  GENNET,  described  in  Brown  and  McBride  [5]  and  it  uses  the  GENNET 
pricing  strategy.  This  strategy  involves  selecting  a  node  and  pricing  out  edl  of  its  incident 
arcs.  GRNET2,  on  the  other  hand,  scans  its  list  of  arcs  linearly  to  locate  pivot  eligible 
axes  and  doesn’t  attempt  to  focus  on  the  arcs  that  are  adjacent  to  some  node.  GRNET2  is 
specialized  to  solve  problems  for  which  at  least  one  of  the  multipliers  defined  for  each  arc 
is  equal  to  1,  while  GENFLO  allows  each  arc  to  have  two  arbitrary  associated  multipliers. 
The  latter  approach  is  desirable  in  integer  programming  applications  since  scailing  variables 
to  make  one  of  the  multipliers  equal  to  1  may  destroy  integrality.  Also,  reflecting  an  arc 
to  convert  a  negative  multiplier  into  a  1  requires  that  the  arcs  have  upper  bounds.  Both 
GENNET  and  GRNET2  use  the  “little  m”  (or  “gradual  peneJty”)  method  [18]  to  find  a 
feasible  solution.  Under  this  method,  a  moderate  initial  cost  is  given  to  the  artificial  arcs, 
and  the  resulting  problem  is  approximately  solved.  Next,  the  cost  on  the  zirtificial  arcs 
is  increased  to  create  a  new  problem,  and  the  optimal  (or  nezu-ly  optimal)  basic  fezisible 
solution  from  the  last  problem  is  used  as  a  warm  start  for  the  new  one.  This  process 
of  gradually  increasing  the  cost  on  the  artificial  arcs  and  solving  a  sequence  of  “easy” 
problems  can  be  shown  empirically  to  yield  a  vast  improvement  over  the  “big  M”  method 
in  terms  of  the  total  number  of  pivots  required  to  solve  a  problem  and  in  terms  of  the 
total  CPU  time.  Some  tests  with  GRNET2  have  shown  that  the  “little  m”  method  is  29 
times  faster  than  the  “big  M”  method  for  large  problems  having  only  one  quasi  tree  in  the 
optimal  basis.  GENFLO  uses  a  simple  closed-form  expression  to  calculate  the  cost  of  the 


axtificial  axes  at  each  iteration.  (The  initial  cost  for  the  GENFLO  artificial  arcs  is  about 
200.  The  cost  on  the  artificial  arcs  is  then  roughly  doubled  at  each  step  in  the  gradual 
penailty  method.)  The  initial  cost  for  the  GRNET2  artificial  arcs  is  20  (assuming  that  the 
cost  range  for  the  regular  arcs  is  1-100).  GRNET2  adds  5  to  the  cost  on  the  artificial  arcs 
after  each  step  vmtil  the  cost  reaches  130,  then  for  the  next  two  steps  the  increment  is  by 
10,  and  for  the  last  two  steps  the  increment  is  by  20.  Finally,  the  cost  is  increased  to  “big 
M”  and  the  problem  is  solved  to  optimality.  An  empirical  comparison  of  these  two  codes 
with  each  other  and  with  MPSX,  the  IBM  general  LP  code,  is  given  below. 

2.2  Parallel  Simplex  Algorithms 

In  Clark  and  Meyer  [11]  an  implementation  of  PGRNET  is  discussed.  This  code 
executes  in  parallel  both  pivots  and  pricing.  Pivots  are  executed  in  parallel  only  if  they 
involve  updating  separate  quasi-trees  (basis  components).  Even  if  the  basis  has  only  one 
component  at  optimality,  this  algorithm  behaves  quite  efficiently  during  the  beginning 
of  the  solution  process,  because  the  initial  starting  basis  has  as  many  components  as 
nodes.  PGRNET  is  used  in  Clark  and  Meyer  [11]  to  solve  problems  generated  randomly  by 
GTGEN  [8],  and  a  code  called  MPGRNET  is  used  to  solve  randomly  generated  multiperiod 
problems  with  a  block  diagonal  structure  generated  by  MPGEN  [6].  MPGRNET  reduces 
contention  between  processors  by  allocating  specific  quasi-trees  to  specific  processors  and 
allowing  processors  to  execute  pivots  involving  their  quasi-trees  (and  only  their  quasi-trees) 
until  they  can  find  no  “local”  pivot  eligible  arcs.  Optimality  is  then  achieved  by  reverting 
to  the  PGRNET  algorithm.  Speedups  for  PGRNET  in  Clark  and  Meyer  [11]  range  from 
4.8  to  8.8  on  7  processors,  and  speedups  for  MPGRNET  ranged  from  8.8  to  36.9  on  12 
processors.  The  superlinear  speedup  for  some  problems  led  the  authors  to  develop  a  serial 
program  that  was  much  more  efficient  for  the  block  diagonal,  or  “multi-period”  problems. 
The  restilting  serial  timing  results  yielded  speedup  results  for  MPGRNET  that  were  slightly 
sublinear.  An  improved  version  of  PGRNET  is  discussed  in  Clark  and  Meyer  [12],  and  in 
Section  2.3  below. 

A  number  of  parallel  algorithms  for  GN  are  discussed  in  Ramamurti  [25].  The  “Chaotic 
Column  Partitioning  Algorithm”  (CCP)  is  similar  to  PGRNET  in  that  it  allows  pivots  to 
be  executed  in  parallel,  provided  that  the  pivots  involve  updating  separate  quzisi-trees. 
Another  algorithm,  known  as  the  “Column  Partitioning  Algorithm”  (CP)  is  similar  to 
MPGRNET.  The  (CCP)  algorithm,  the  (CP)  algorithm  and  other  algorithms  aire  applied 
in  Ramamurti  [25]  to  problems  generated  randomly  by  GNETGEN,  a  modification  of 
NETGEN  [21].  Speedups  for  the  (CCP)  algorithm  on  8  processors  range  from  1.27  to 
1.73,  amd  speedups  for  (CP)  on  8  processors  range  from  1.33  to  2.48. 


In  Section  2.4  we  discuss  TPGRNET  [12],  an  algorithm  that  devotes  one  processor  to 
the  task  of  executing  pivots,  and  devotes  all  other  processors  to  the  task  of  computing  re¬ 
duced  costs  and  managing  candidate  lists.  (TPGRNET  denotes  “Task  Parallel  GRNET”.) 
An  algorithm  (the  Data  Partitioning  Algorithm)  that  is  similar  to  TPGRNET  in  its  par¬ 
titioning  of  tasks  is  discussed  in  Ramamiirti  [25]  and  is  applied  to  a  set  of  generalized 
networks  defined  on  grids.  The  advantage  to  having  one  processor  do  all  pivoting  is  that 
there  is  no  contention  between  processors  for  shared  data  structures,  even  when  there  is 
only  one  basis  component  in  the  optimal  basis.  This  strategy  yields  an  algorithm  that  is 
robust  in  the  sense  that  it  has  a  behavior  that  does  not  depend  heavily  on  the  nature  of 
the  optimal  basis,  and  hence  is  appropriate  for  arbitrary  generalized  networks. 

2.3  PGRNET  (Parallel  GRNET) 

Figure  2  gives  a  flow  chart  for  PGRNET.  Parallel  portions  of  the  code  are  emphasized 
by  parallel  lines,  “l.t.”  designates  list -threshold,  a  candidate  list  parameter. 


Figure  2. 


Flow  Chart  For  Parallel  Algorithm  PGRNET 


The  (parallel)  PGRNET  algorithm  can  be  summarized  as  follows: 

PGRNET 

INITIALIZATION 

In  parallel,  generate  the  initial  flows  on  the  artificial  arcs.  Divide  the  problem  arcs 
into  roughly  equal-sized  segments  for  pricing  in  the  next  stage. 

STAGE  1  {parallel  pivoting  with  candidate  lists) 

Asynchronously  and  in  parallel  scan  the  segments  of  the  Eire  set  to  develop  multiple 
candidate  lists.  Pivot  arcs  are  selected  from  the  candidate  lists,  and  queisi- trees  are 
“locked”  before  pivots  are  made.  (In  a  sheu'ed  memory  environment,  a  locking  oper¬ 
ation  by  a  processor  on  the  portion  of  the  shared  data  corresponding  to  the  one  or 
two  selected  quasi-trees  serves  to  ensure  exclusive  access  to  those  quasi-trees  by  the 
locking  processor  .)  When,  for  a  particular  segment  of  the  arc  set,  it  is  not  possible 
to  develop  a  candidate  list  with  more  than  list. threshold  entries,  check  the  penalty 
on  the  artificial  arcs.  If  this  penalty  has  reached  its  maximum  value  go  to  STAGE  2. 
Otherwise,  assign  a  new  value  to  the  penalty  of  the  artificial  arcs,  update  the  duals 
in  parallel  zmd  continue  asynchronous  pivoting. 

STAGE  2  {parallel  verification  of  optimality) 

Scan  the  segments  of  the  arc  fist  in  parallel  to  locate  pivot-eligible  arcs.  If  a  pivot- 
eligible  arc  is  found,  lock  the  associated  quasi-trees,  and  execute  the  pivot  (if  the 
quasi-trees  were  successfully  locked).  If  an  entire  sweep  through  the  segments  can  be 
made  without  finding  any  pivot-eligible  arcs,  optimality  htis  been  reached. 


Arcs  are  partitioned  roughly  equally  among  segments.  If  there  are  n  arcs  and  P 
segments,  then  processor  1  hais  arcs  (1)  through  \n/P],  processor  2  gets  arcs  fn/P]  -1- 1 
through  f2n/P]  and  so  forth.  (A  more  sophisticated  allocation  of  the  non-artificial  arcs 
could  be  based  on  an  approximation  of  the  topology  of  the  optimal  solution.  In  the  limiting 
case,  if  the  optimal  topology  is  known,  lock  contention  could  be  reduced  significantly  by 
allocating  to  one  processor  aU  of  the  arcs  between  nodes  of  a  given  queisi-tree  or  group  of 
quasi-trees.  This  approach  could  also  be  used  to  solve  perturbed  problems  more  efficiently 
by  using  the  optimal  topology  of  the  base  case.)  The  dual  variables  of  all  the  nodes,  the 
predecessor  threads,  the  successor  threads  and  all  other  tree  functions  required  by  the 
generalized  network  simplex  method  are  stored  in  shared  memory  and  are  available  to 
all  processors.  It  is  important  to  emphasize  that  only  the  acquisition  of  problem  data 
(i.e.,  generating  data  or  reading  data)  is  done  serially,  and  the  solution  process  is  entirely 
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parallel.  The  number  of  partitions  is  equal  to  the  number  of  processors,  and  all  processors 
execute  the  same  code,  which  is  almost  identical  to  that  of  GRNET2.  The  major  differences 
in  the  parallel  code  are  the  quasi-tree  locking,  the  “audit”  of  the  reduced  cost  of  the 
proposed  entering  arcs,  and  the  distributed  dual  update.  The  asynchronicity  of  the  quasi¬ 
tree  locking  and  its  effect  on  pivot  selection  make  this  parallel  approach  very  difficult  to 
emulate  on  a  serial  computer.  We  now  present  some  additional  detail  on  this  parallel 
approach. 

During  a  parallel  pivoting  stage.  Stage  1,  each  processor  makes  its  own  candidate  list 
of  pivot-  eligible  arcs  from  the  arcs  assigned  to  it.  These  candidate  lists  are  made  in  the 
same  way  that  candidate  lists  are  made  in  GRNET2.  Each  processor  p  chooses  its  next 
pivot  arc  from  its  candidate  list  by  selecting  the  pivot-eligible  arc  with  the  greatest  reduced 
cost  in  absolute  value.  If  the  quasi-trees  at  the  ends  of  the  arc  have  not  been  locked  by 
another  processor,  p  locks  the  quasi-trees  to  keep  other  processors  from  interfering  %vith 
the  tree  update,  checks  the  reduced  cost,  and,  if  the  arc  is  still  pivot  eligible,  performs  the 
pivot  and  removes  the  arc  from  the  candidate  list.  If  the  quasi-trees  Eire  already  locked, 
or  if  the  arc  is  no  longer  pivot  eligible,  processor  p  removes  the  arc  from  the  candidate 
list  and  chooses  another  arc.  The  dual  update  pau-t  of  the  pivot  operation  has  also  been 
parallelized  as  follows:  a  processor  traversing  a  quasi-tree  to  update  duals  puts  roots  of 
“large”  subtrees  that  are  encountered  on  a  shared  queue;  other  processors  periodically 
check  the  queue,  and  when  it  is  non-empty,  take  a  subtree  off  the  the  queue  and  perform 
the  dual  update  on  the  subtree.  When  the  candidate  list  belonging  to  p  has  no  more  than 
list  .threshold  arcs,  p  develops  a  new  candidate  list.  If  the  new  candidate  list  also  has 
no  more  than  list. threshold  entries,  processor  p  sets  a  flag  in  shared  memory  to  indicate 
that  it  is  having  difficulty  finding  pivot-eligible  arcs.  This  flag  is  checked  frequently  by 
all  processors,  and  when  it  is  set,  processor  1  checks  to  see  if  the  penedty  on  the  artificial 
arcs  is  big  M.  If  the  penalty  is  big  M,  all  processors  enter  Stage  2.  If  the  penalty  is 
smaller,  then  the  processors  increment  the  penalty  on  the  artificial  arcs  and  cooperate  in 
recomputing  the  dual  variables.  Then  zJl  processors  develop  new  candidate  lists. 

Stage  2  of  PGRNET  corresponds  to  a  verification  of  optimality  stage.  The  verification 
of  optimzdity  is  done  in  parallel,  and  all  processors  execute  the  same  tasks.  Optimality  is 
achieved  by  performing  any  remaining  pivots.  Processors  sweep  through  their  segments 
looking  for  pivot-eligible  arcs,  but  no  candidate  lists  are  developed.  If  processor  p  finds  a 
pivot-eligible  arc,  it  locks  the  quasi-trees  at  either  end  of  the  arc,  executes  the  pivot,  and 
indicates  to  the  other  processors  that  they  must  restart  their  sweep  (by  setting  flags  in  a 
shared  array).  This  restart  mechanism  is  needed  because  a  pivot  executed  by  processor 


p  might  cause  an  arc  owned  by  another  processor  to  become  pivot-eligible.  If  processor 
p  finds  that  one  of  the  trees  at  the  ends  of  a  pivot-eligible  arc  is  locked,  it  sets  the  other 
processors  restart  flags  and  restarts  its  own  sweep.  Each  processor  checks  its  restart  flag 
frequently  during  Stage  2,  and  when  a  processor  finds  that  its  flag  has  been  set,  it  marks 
the  arc  in  its  segment  that  was  last  priced  ,  and  continues  pricing.  If  the  processor  prices 
all  of  its  axes  up  to  the  marked  arc  without  finding  any  to  be  pivot-eligible  and  without 
finding  its  restart  flag  to  be  set,  that  processor  informs  the  others  that  none  of  its  arcs  are 
pivot-eligible.  Optimality  is  reached  when  all  processors  make  a  sweep  through  their  arcs 
without  finding  their  restart  flags  set,  and  without  finding  any  zircs  that  are  pivot-eligible. 

PGRNET  is  thus  an  example  of  data  or  uniform  parallelism.  The  results  in  Section  3 
show  that  uniform  parallelism  works  well  for  generalized  network  flow  problems  in  which 
the  number  of  quasi-trees  in  the  optimal  solution  is  not  too  small. 

2.4  TPGRNET  (Task-Parallel  GRNET) 

This  algorithm  is  divided  into  two  stages,  with  97%  of  all  pivots  executed  in  Stage  1. 
During  Stage  1,  different  tasks  are  allocated  to  different  processors  (see  Figure  3).  One 
processor  executes  all  pivots,  one  processor  selects  pivot  arcs  for  the  pivoting  processor, 
and  all  other  processors  do  pricing  and  place  pivot  eligible  arcs  into  a  shared  candidate  list 
to  be  scanned  by  the  selecting  processor.  If  a  pivot  requires  updating  a  large  quasi-tree,  the 
pivoting  processor  can  enlist  the  help  of  the  pricing  processors  by  putting  the  root  nodes 
of  subtrees  on  a  queue.  When  these  root  nodes  are  detected  by  the  pricing  processors 
in  the  course  of  a  periodic  check,  they  take  them  off  the  queue  and  update  the  duals  in 
the  corresponding  subtrees.  Stage  2,  in  which  only  about  3%  of  the  pivots  axe  done,  is 
analogous  to  that  of  the  data-parallel  PGRNET  approach  in  that  each  processor  scans  a 
segment  of  the  arc  list  belonging  to  that  processor.  As  in  PGRNET,  when  a  processor  finds 
a  pivot  eligible  arc,  it  locks  the  quasi-trees  at  the  ends  of  the  arc,  to  temporarily  exclude 
all  other  processors  from  modifying  those  quasi-trees,  checks  the  reduced  cost,  and,  if  it  is 
still  pivot  eligible,  executes  the  pivot.  We  revert  to  the  data  parallel  approach  and  develop 
no  candidate  lists  in  Stage  2  because  very  few  pivot  eligible  arcs  exist  at  this  stage.  More 
details  of  the  algorithm  are  given  below.  Figure  3  illustrates  the  flow  of  information  during 
Stage  1,  and  Figure  4  gives  a  flow  chart  for  TPGRNET.  In  both  figures,  dotted  arrows 
indicate  the  direction  of  the  flow  of  information. 


Figure  3  Flow  of  Information  in  TPGRNET,  Stage  1 


Figure  4  Flow  Chart  for  Parallel  Algorithm  TPGRNET 


The  (parallel)  TPGRNET  algorithm  is: 


TPGRNET 


INITIALIZATION 

In  parallel,  generate  the  initial  flows  on  the  artificial  arcs.  Divide  the  problem  aircs 
into  roughly  equal-sized  segments  for  pricing  during  the  next  stage. 

STAGE  1  {parallel  candidate  list  development  overlapped  with  serial  pivoting) 

A  set  of  candidate  lists  is  developed  and  prioritized  in  parallel.  This  process  is  con¬ 
tinued  during  the  pivot,  which  concmrently  modifies  some  of  the  duals  being  used  in 
candidate  list  development.  When  the  pivot  is  completed,  the  next  arc  to  enter  the 
basis  is  selected  by  using  the  “best”  azc  from  the  candidate  list  (if  this  arc  has  a  suffi¬ 
ciently  good  reduced  cost)  or  a  different  arc  (generally  from  the  candidate  lists-details 
are  given  below)  if  this  is  not  possible.  The  latter  case  occurs  very  infrequently,  and 
under  conditions  to  be  described  below,  may  trigger  an  increaise  in  the  penalty  cost 
or  an  exit  to  Stage  2. 

STAGE  2  {parallel  pivoting  and  verification  of  optimality) 

Scan  the  segments  of  the  arc  list  in  parallel  to  locate  pivot  eligible  arcs.  If  a  pivot 
eligible  arc  is  found,  try  to  lock  the  associated  quasi-trees,  and,  if  the  quasi-trees  were 
successfully  locked,  execute  the  pivot.  If  an  entire  sweep  through  the  segments  can 
be  made  without  finding  einy  pivot  eligible  zircs,  optimality  has  been  reached. 

We  will  now  describe  in  detail  the  tasks  performed  by  the  individual  processors  during 
Stage  1  of  TPGRNET.  The  pricing  processors  have  the  task  of  computing  reduced  costs 
and  storing  pivot  eligible  arcs  in  shared  candidate  lists  of  length  10.  When  processor  p 
finds  a  pivot  eligible  arc,  it  recomputes  the  reduced  cost  of  the  first  element  in  its  candidate 
list  to  determine  whether  the  new  arc  has  a  larger  reduced  cost  in  absolute  value.  If  it 
does,  the  2u:c  number  gets  stored  in  the  first  element  of  the  array,  amd  the  previous  entry  is 
overwritten.  Experience  has  shown  that  saving  the  previous  entry  yields  no  improvement 
in  efficiency.  If  the  new  arc  has  a  smaller  reduced  cost  than  the  first  arc  in  the  candidate 
list,  the  new  arc  gets  stored  at  a  random  location  in  the  list.  The  pricing  processors  stay 
in  a  loop  that  includes  three  operations.  First,  there  is  the  pricing  operation.  This  uses 
most  of  the  processor’s  CPU  time.  Second,  there  is  a  check  to  determine  if  the  pivoting 
processor  has  put  a  subtree  on  the  dual-update  queue  (because  of  space  limits  this  is  not 
shown  in  the  figures).  Third,  there  is  a  check  to  determine  if  Stage  1  of  the  algorithm  has 
finished. 


C-14 


The  pivot  selecting  processor  has  the  task  of  scanning  the  candidate  lists  of  the  pricing 
processors  to  locate  the  pivot  eligible  arc  with  the  largest  reduced  cost  and  storing  that  arc 
in  a  single  shared  variable  called  best^cand.  This  processor  stays  in  a  loop  that  has  three 
operations.  First,  the  processor  checks  whether  best.cand  is  empty.  If  so,  the  processor 
looks  in  the  first  entry  of  each  of  the  candidate  lists  to  find  an  arc  to  put  in  best.cand. 
Second,  the  processor  traverses  the  candidate  lists  to  determine  if  there  is  a  pivot  eligible 
axe  that  has  a  reduced  cost  larger  than  the  arc  in  be$t.cand.  Third,  there  is  a  check  to 
determine  if  Stage  1  of  the  algorithm  is  fim'shed. 

The  pivoting  processor  stays  in  a  loop  in  which  it  selects  pivot  arcs  for  itself  (as 
described  below),  executes  pivots,  and  directs  the  increases  in  the  penalty  on  the  artificial 
Eircs.  Whenever  possible,  the  pivoting  processor  selects  its  pivot  arc  from  best.cand,  but 
before  accepting  an  arc  from  best.cand,  an  “audit”  is  made  to  determine  if  the  arc  is  still 
pivot  eligible  and  if  the  reduced  cost  is  sufficiently  Icirge  in  absolute  value.  (Note  that  the 
dual  variables  used  by  the  pricing  or  selecting  processor  to  price  the  arc  may  have  been 
changed  dtiring  the  course  of  the  pivot,  so  that  an  “audit”  is  needed  to  ensure  that  the 
proposed  arc  still  provides  a  “good”  pivot.)  If  the  arc  in  best.cand  has  a  small  reduced 
cost,  or  if  there  is  no  arc  in  best.cand,  the  pivoting  processor  looks  at  the  first  entry  of  each 
of  the  candidate  lists  to  find  a  pivot  eligible  arc.  If  a  pivot  eligible  arc  is  found,  the  pivot 
is  executed.  If  no  pivot  eligible  arc  is  found,  then  either  the  penalty  on  the  artificial  arcs  is 
increased,  or  Stage  2  is  begun  (if  the  penalty  has  reached  big  M  and  cannot  be  increased). 
The  pivoting  processor  also  has  the  task  of  directing  the  parallel  update  of  dual  variables 
during  the  execution  of  pivots  and  after  the  penedty  on  the  artificial  variables  has  been 
increased.  During  both  of  these  operations,  the  pivoting  processor  cm  put  the  root  nodes 
of  subtrees  onto  the  dual  update  queue,  and  the  pricing  processors  will  then  eissume  the 
tzisks  of  updating  the  duals  on  those  subtrees. 


3  COMPUTATIONAL  EXPERIENCE 


3.1  Results  for  pure  network  problems 

Pure  networks  are  a  special  case  of  generalized  networks  (all  multipliers  have  a  magni¬ 
tude  of  1).  In  order  to  compare  the  efficiency  of  general  versus  specific  codes,  we  consider 
the  relative  performance  of  a  general  simplex  code  (MPSX),  two  generalized  network  codes, 
and  a  pure  network  code  on  a  class  of  pure  network  test  problems.  Table  II  gives  results 
for  a  collection  of  problems  generated  by  NETGEN  [21].  The  problem  numbers  have  the 
prefix  “N”,  to  indicate  that  they  were  generated  by  NETGEN,  and  a  numerical  suffix  that 
indicates  the  standard  NETGEN  problem  number  [21].  All  times  are  in  seconds,  and  all 
runs  were  made  on  an  IBM  3081-D24.  Although  the  number  of  pivots  executed  by  MPSX 
(the  IBM  proprietary  mathematical  programming  system)  is  roughly  equal  to  the  number 
of  pivots  executed  by  NETFLO  [20],  NETFLO  is  roughly  68  times  faster  than  MPSX  due 
to  the  fact  that  it  is  designed  to  solve  pure  network  problems,  and  it  utilizes  the  tree  struc¬ 
ture  of  bases  and  uses  integer  arithmetic.  GENNET  uses  an  improved  pricing  strategy 
that  reduces  the  total  number  of  pivots  by  a  factor  of  three,  compared  to  MPSX.  Overall. 
GENNET  timings  are  about  54  times  faster  than  MPSX.  The  timings  and  the  number 
of  pivots  for  GENFLO  are  similar  to  those  of  GENNET.  GENFLO  solves  these  problems 
with  fewer  pivots  than  GENNET,  but  CPU  times  are  about  25%  slower,  possibly  due  to 
the  fact  that  GENFLO  allows  for  two  arbitrary  multipliers.  These  results  clearly  justify 
the  utility  of  specialized  generalized  network  software. 


Table  II  Serial  results  for  NETGEN  problems  (IBM  3081-D24) 


Size 

MPSX 

GENFLO 

GENNET 

NETFLO 

Problem 

nodes 

arcs 

pivots 

secs. 

pivots 

secs. 

pivots 

secs. 

pivots 

secs. 

400 

4,500 

2,818 

30.60 

1,288 

1.41 

1,307 

1.19 

2,073 

0.4  7 

400 

1,306 

2,077 

12.00 

593 

0.49 

578 

0.39 

1,079 

0  24 

400 

2,443 

4,229 

29.40 

688 

0.71 

765 

0.53 

1,305 

0.23 

400 

1,416 

3,052 

18.00 

613 

0.52 

504 

0.33 

1.2S4 

0.29 

400 

2,836 

7,073 

57.60 

492 

0.47 

604 

0.45 

1,156 

0.22 

N26 

400 

1,382 

4,286 

24.60 

511 

0.42 

500 

0.27 

917 

0.14 

N27 

400 

2,676 

11,829 

95.40 

628 

0.55 

826 

0.46 

1,730 

0.28 

N28 

2,900 

3,313 

38.40 

1,487 

1.39 

1,732 

1.24 

3,524 

0.93 

N29 

3,400 

3,744 

43.80 

1,889 

1.59 

1,996 

1.18 

4,570 

1.12 

N30 

4,400 

4,954 

60.00 

1,947 

1.87 

1,969 

1.31 

4,346 

1.04 

N31 

4,800 

6,232 

81.00 

2,171 

2.13 

2,347 

1.47 

4,798 

1.13 

N33 

1,500 

4,385 

5,836 

103.20 

2,645 

2.83 

2,521 

2.01 

6,113 

2.16 

N34 

1,500 

5,107 

6,503 

110.40 

2,498 

2.50 

2,943 

2.10 

7,640 

2.37 

N35 

1,500 

5,730 

7,026 

115.80 

3,017 

3.35 

3,310 

2.82 

7,384 

2.30 

Total 

72,972 

820.20 

20,467 

20.23 

21,902 

15.75 

47,919 

3.2  Results  for  GNETGEN  generalized  network  problems 

NETGEN  has  been  modified  by  D.  Klingmem,  to  generate  generalized  network  flow 
problems.  The  modified  generator  is  called  GNETGEN.  Table  III  gives  most  of  the  GNET¬ 
GEN  input  data  for  the  small  test  problems  Gl  through  G7.  The  prefix  “G”  for  these 
problems  indicates  that  they  were  generated  by  GNETGEN.  The  numerical  suffix  corre¬ 
sponds  to  the  problem  numbers  in  Table  2.2  in  Ramamurti  [25].  (The  random  seed  for  all 
of  these  problems  is  13502460.) 


Table  III  Input  data  for  smadl  GNETGEN  problems 


Problem 

Gl 

G2 

G3 

G4 

G5 

G6 

G7 

Nodes 

200 

200 

200 

300 

400 

400 

1,000 

Arcs 

1,500 

4,000 

6,000 

4,000 

5,000 

7,000 

6,000 

Sources 

100 

5 

15 

135 

20 

30 

20 

Sinks 

100 

195 

50 

165 

100 

50 

100 

Cost  Range 

1-100 

1-100 

1-100 

1-100 

1-100 

1-100 

1-100 

Gain  Range 

.5-1.5 

.5-1.5 

.25-.95 

.5-1.5 

.3-1.7 

.5-1.5 

.4-1.4 

Supply 

100k 

100k 

100k 

100k 

100k 

200k 

%  Capacitated 

0 

100 

100 

0 

0 

100 

100 

Bound  Range 

— 

l-2k 

l-2k 

— 

— 

l-2k 

4-6k  1 

_ i 

Table  IV  gives  results  for  MPSX,  GENNET  and  GENFLO  for  problems  Gl  through 
G7.  For  these  problems,  GENNET  is  about  12  times  faster  than  MPSX  and  GENFLO 
about  11  times  faster.  GENFLO  solves  these  problems  with  about  40%  fewer  pivots  than 
MPSX.  Note  that  relaxing  the  assumption  that  one  of  the  multipliers  is  unity  results  in  an 
increase  in  computing  time  of  only  about  10%. 


Table  IV  Serial  results  for  small  GNETGEN  problems  (IBM  30S1-D24) 


Size 

MPSX 

GENFLO 

GENNET 

Prob. 

nodes 

arcs 

pivots 

secs. 

secs. 

pivots 

secs. 

Gl 

1,500 

1,151 

7.80 

533 

0.95 

590 

0.62 

G2 

Hi 

4,000 

550 

3.00 

358 

0.23 

443 

0.22 

G3 

200 

6,000 

2,058 

18.60 

954 

1.53 

1,448 

2.07 

G4 

300 

4,000 

4,112 

47.40 

2,106 

4.23 

2,703 

3.50 

G5 

400 

5,000 

1,870 

26.20 

897 

2.23 

1,229 

2.06 

G6 

400 

7,000 

1,408 

16.80 

1,171 

1.68 

1,591 

1.59 

G7 

1,000 

6,000 

2,811 

40.20 

2,352 

3.60 

3,160 

3.30 

Total 

13,960 

160.00 

8,371 

14.45 

11,164 

13.36 

Tables  V  through  VII  give  the  input  data  for  the  larger  GNETGEN  problems  GS 
through  G22.  These  problems  are  generated  wdth  the  same  input  data  as  problems  1 
through  15  in  Table  4.1a  and  4.1b  in  Ramamurti  [25].  The  problems  are  grouped  according 
to  reciangularity  ratios  (axcs/nodes). 


Table  V  GNETGEN  problems  G8-G13 


Problems 


Characteristics 


Sources 
Sinks 
%  Capacitated 
Cost  Range 
Bound  Range 
Mult.  Range 


G8 

G9 

2,000 

2,000 

13,000 

13,000 

150 

150 

600 

600 

100 

50 

1-100 

1-100 

lk-2k 

lk-2k 

0.5-1.5 

0.5-1.5 

2,00 

3,00 

15 

60 


0 

1-100 


4,000 

26,000 

150 

600 

100 

1-100 

lk-2k 


4,000 

26,000 

150 

600 

50 

1-100 

lk-2k 


Table  VI  GNETGEN  problems  G14-G16 


Problems 


G16 


Characteristics 


Sources 
Sinks 
%  Capacitated 
Cost  Reinge 
Bound  Range 
Bound  Range 


G14 

G15 

6,000 

6,000 

39,000 

39,000 

150 

150 

600 

600 

100 

50 

MOO 

1-100 

lk-2k 

lk-2k 

0.5-1.5 

0.5-1.5 

0.5-1. 5 


Table  VII  GNETGEN  problems  G17-G22 


Problems 


Characteristics 


Nodes 
Arcs 
Sources 
Sinks 
%  Capacitated 
Cost  Range 
Boimd  Range 
Mult.  Range 


MOO 

lk-2k 


G18 


2,000 

25,000 

150 

600 

50 

1-100 

lk-2k 


2,00 

5,00 

15 

60 


0 

1-100 


2,000 

50,000 

150 

600 

100 

1-100 

lk-2k 


2,000 

50,000 

150 

600 

50 

1-100 

lk-2k 


G22 


2,00 
0,00 
15 
60 

0 

1-100 


0.5-1.5  0.5-1.5  0.5-1.5  0.5-1.5  0.5-1.5  0.5-1.5 


Table  VIII  gives  a  comparison  of  GENFLO  and  GRNET2  for  problems  G8  through 
G22.  The  two  programs  give  nearly  the  same  performance  for  these  test  problems,  despite 
the  fact  that  the  two  codes  have  very  different  pricing  strategies.  The  number  of  pivots 
for  GRNET2  is  about  15%  less  than  for  GENFLO,  and  the  total  time  for  GRNET2  on  the 
test  problem  set  is  about  4%  less. 


Table  VIII  Serial  results  for  large  GNETGEN  problems  (Sequent  S81) 


Size 

GENFLO 

GRNET2 

Problem 

nodes 

arcs 

pivots 

time 

pivots 

time 

G8 

5,108 

60.5 

3,892 

G9 

4,454 

44.2 

3,998 

GIO 

2,000 

4,634 

50.2 

3,808 

Gil 

4,000 

26,000 

9,897 

125.0 

7,690 

121.9 

G12 

4,000 

26,000 

9,815 

123.6 

7,460 

115.1 

G13 

4,000 

26,000 

9,145 

99.3 

7,375 

105.2 

G14 

6,000 

39,000 

13,262 

145.4 

10.245 

142.2 

G15 

6,000 

39,000 

12,900 

126.3 

10,059 

158.9 

G16 

6,000 

39,000 

13,653 

141.0 

10,456 

152.0 

G17 

2,000 

25,000 

6,186 

89.2 

6,369 

98.0 

G18 

2,000 

25,000 

6,596 

93.4 

5,260 

78.4 

G19 

2,000 

25,000 

6,629 

119.2 

5,440 

81.3 

G20 

50,000 

8,601 

198.4 

9,608 

194.8 

G21 

50,000 

9,208 

204.5 

10,343 

194.9 

G22 

llll 

50,000 

8,500 

174.2 

7,913 

133.8 

Total 

128,588 

1,794.4 

109,916 

1,723.7 

Table  IX  gives  CPU  times  for  TPGRNET  (run  on  various  numbers  of  processors)  for 
problems  G8  through  G22.  TPGRNET  is  faster  than  the  parallel  versions  of  GENFLO  for 
these  test  problems,  and  it  is  more  robust  in  the  sense  that  CPU  times  usually  decrease 
monotonically  as  the  number  of  processors  is  increased.  Hence,  we  report  here  only  the 
performance  results  for  TPGRNET.  The  column  heading  “cap”  denotes  the  percentage  of 
capacitated  arcs,  and  the  column  heading  “qtree”  denotes  the  number  of  quasi-trees  in  the 
optimal  solution.  The  serial  times  reported  in  the  table  are  taken  from  GRNET2  if  they 
have  the  “T”  prefix,  and  they  are  taken  from  GENFLO  if  they  have  the  “0”  prefix.  The 
serial  time  given  is  always  taken  from  the  faster  of  the  two  codes.  The  time  totals  from 
the  bottom  of  Table  IX  are  graphed  in  Figure  5. 


Figure  5  Composite  results  for  TPGRNET 


Table  IX  TPGRNET  timings  for  problems  G8  through  G22 


Prob 

nds 

arcs 

cap 

qtree 

Number 

1  5 

of 

7 

Processors 

9  11 

(Sequent) 

13  15 

G8 

13k 

wm 

29.3 

19.8 

16.2 

16.3 

16.9 

15.8 

G9 

13k 

24.6 

16.7 

14.0 

12.6 

13.1 

13.S 

GIO 

13k 

0 

25.2 

17.5 

14.2 

12.9 

13.2 

13.3 

Gll 

4k 

26k 

100 

T121.9 

58.3 

43.8 

34.3 

35.6 

33.8 

31.S 

G12 

4k 

26k 

50 

T115.1 

61.2 

41.1 

36.4 

33.5 

31.6 

31.4 

G13 

4k 

26k 

0 

3 

099.3 

56.8 

38.7 

34.0 

29.0 

29.9 

29.6 

G14 

6k 

39k 

100 

5 

87.5 

58.4 

50.3 

44.3 

45.1 

43.8 

G15 

6k 

39k 

50 

7 

87.0 

65.5 

51.6 

48.1 

46.5 

47.0 

G16 

6k 

39k 

0 

2 

0141.0 

89.5 

61.3 

51.3 

45.8 

40.8 

40.8 

G17 

2k 

25k 

100 

9 

089.2 

39.0 

25.2 

20.2 

19.2 

18.1 

16.8 

G18 

2k 

25k 

50 

3 

T78.4 

41.3 

29.8 

22.5 

22.5 

21.2 

19.0 

G19 

2k 

25k 

0 

3 

T81.3 

41.8 

25.7 

21.9 

18.4 

18.1 

18.2 

G20 

2k 

50k 

100 

2 

T194.8 

77.5 

49.3 

44.3 

35.0 

34.8 

33.0 

G21 

2k 

50k 

50 

3 

T194.9 

83.4 

50.8 

42.2 

36.5 

31.7 

33.5 

G22 

2k 

50k 

0 

1 

T133.8 

64.4 

44.5 

37.9 

32.1 

32.0 

27.9 

Totals 

1723.5 

866.8 

588.1 

491.3 

! 

441.6 

426.8 

415.7 

-20 


Table  X  gives  speedup  resiilts  for  problems  G8  through  G22,  and  the  results  are 
graphed  for  problems  G14,  G15,  G16  and  problems  G20  ,G21,  and  G22.  Although  the 
percentage  of  capacitated  arcs  has  little  effect  on  speedup  for  this  problem  class,  the 
rectangularity  ratio  is  importeint  in  terms  of  efficiency.  Problems  G8-16  have  the  smallest 
ratio  of  arcs  to  nodes,  and  TPGRNET  yields  the  smallest  speedup  for  these  problems.  On 
the  other  hand,  TPGRNET  has  an  average  speedup  of  5.4  on  15  processors  for  problems 
G20  ,  G21  and  G22,  which  have  the  largest  arcs/nodes  ratio.  Since  TPGRNET  achieves 
most  of  its  parallelism  from  pricing  arcs  in  parallel,  the  dependence  of  the  efficiency  of 
TPGRNET  on  the  arcs/nodes  ratio  is  understandable. 


Table  X  TPGRNET  speedups  for  problems  G8  through  G22 


Prob 

nds 

arcs 

cap 

qtree 

Number 

1  5 

of 

7 

Processors 

9  11 

(Sequent) 

13  15 

GS 

21c 

13k 

1.0 

1.7 

2.6 

3.2 

3.1 

3.0 

3.2 

G9 

2k 

13k 

Mm 

1.0 

1.7 

2.6 

3.1 

3.0 

3.3 

3.2 

GIO 

2k 

13k 

0 

1.0 

1.9 

2.7 

3.4 

3.7 

3.6 

3.6 

Gll 

4k 

26k 

100 

1.0 

2.0 

2.7 

3.5 

3.4 

3.6 

3.8 

G12 

4k 

26k 

50 

1 

1.0 

1.8 

2.8 

3.1 

3.4 

3.6 

3.6 

G13 

4k 

26k 

0 

3 

1.0 

1.7 

2.5 

2.9 

3.4 

3.3 

3.3 

G14 

6k 

39k 

100 

5 

1.0 

1.6 

2.4 

2.8 

3.2 

3.1 

3.2 

G15 

6k 

39k 

50 

1.0 

1.4 

1.9 

2.4 

2.6 

2.7 

2.6 

G16 

6k 

39k 

0 

1.0 

1.5 

2.3 

2.7 

3.0 

3.4 

3.4 

G17 

2k 

25k 

100 

1.0 

2.2 

3.5 

4.4 

4.6 

4.9 

5.3 

G18 

2k 

25k 

50 

1.0 

1.8 

2.6 

3.4 

3.4 

3.6 

4.1 

G19 

2k 

25k 

0 

1.0 

1.9 

3.1 

3.7 

4.4 

4.4 

4.4 

G20 

2k 

50k 

100 

1.0 

2.5 

3.9 

4.3 

5.5 

5.5 

5.9 

G21 

2k 

50k 

50 

1.0 

2.3 

3.8 

4.6 

5.3 

6.1 

5.8 

G22 

2k 

50k 

0 

1 

1.0 

2.0 

3.0 

3.5 

4.1 

4.1 

Averages 

1.0 

1.9 

2.9 

3.5 

3.8 

4.0 

IB 

I'  -  I  ■  r  '  t~  ■  ■  I-  I  ■  ■ 

6  7  8  9  lO  11  12  13  14 

NUMBER  OE  PROCESSORS 


15 


Figtire  6  Speedups  for  TPGRNET 


3.3  Results  for  MAGEN  Problems 


Table  XI  gives  results  for  a  group  of  large  problems  generated  by  MAGEN,  the  genera¬ 
tor  used  in  Clark  and  Meyer  [12].  This  is  a  modification  of  GTGEN,  a  generator  described 
in  Chang  and  Engquist  [8].  All  problems  have  30,000  nodes  and  more  than  300,000  arcs. 
The  precise  MAGEN  input  data  for  these  problems,  as  well  as  optimal  objective  function 
values  Me  given  in  Clark  [10].  MAGEN  generates  random  bipartite  generalized  network 
problems  ,  but  allows  the  user  to  specify,  roughly,  the  granularity  of  the  generated  problem, 
(i.e.,  the  user  may  adjust  the  number  of  quasi-trees  in  the  optimal  basis).  Since  problems 
4.00  through  4.50  have  very  different  granularities,  the  effect  of  granularity  on  the  efficiency 
of  TPGRNET  and  PGRNET  can  be  studied  by  solving  these  problems.  Although  MAGEN 
was  motivated  by  the  need  to  compare  parallel  methods  whose  performance  was  depen¬ 
dent  on  test  problem  structue,  it  should  also  be  noted  that  even  from  a  serial  computing 
viewpoint,  granularity  has  a  significant  impact  on  solution  time.  For  example,  GRNET2 
solves  4.50  sixteen  times  fzister  than  4.00,  even  though  it  does  only  30%  fewer  pivots. 
This  means  that  the  pivots  in  4.50  are  relatively  fast,  due  to  the  fact  that  quasi-trees  are 
smaller  on  average  than  those  of  problem  4.0.  PGRNET  yields  am  impressive  speedup  of 
11.1  over  GRNET2  for  4.50,  because  the  quasi-trees  are  numerous  in  the  optimal  basis 
(and  in  the  intermediate  bases).  The  serial  version  of  GENFLO  outperforms  GRNET2  on 
problem  4.50  in  terms  of  CPU  time,  but  GRNET2  significantly  outperforms  GENFLO  for 
the  more  difficult  problems  in  terms  of  both  CPU  time  and  the  number  of  pivots.  Looking 
at  problem  4.00,  one  sees  that  the  fastest  serial  algorithm  is  GRNET2,  and  the  fastest 
parallel  algorithm  is  TPGRNET.  The  shared  candidate  list  and  parallel  pricing  strategy 
of  TPGRNET  makes  TPGRNET  outperform  PGRNET  in  terms  of  the  number  of  pivots 
for  all  problems,  but,  except  for  problem  4.00,  this  cannot  compensate  for  the  parallel 
pivoting  of  PGRNET.  In  all  of  the  other  problems,  PGRNET  outperforms  TPGRNET  in 
terms  of  CPU  time  because  PGRNET  executes  pivots  in  parallel. 


Problem  # 


#  qtrees  at  optimality 


#  nodes 


if  arcs 


#  pivots  GENFLO 

#  pivots  GRNET2 

#  pivs  19  procs  PGRNET 
if  pivs  19  procs  TPGRNET 


CPU  secs.  GENFLO 
CPU  secs.  GRNET2 
CPU:  19  procs  PGRNET 
CPU:  19  procs  TPGRNET 


Speedup  PGRNET 
Speedup  TPGRNET 


***  Did  not  finish  after  14  hours. 


4.01 


30,000 


322,289 


328,711 

365,633 

265,774 


*** 

22,434 

5,829 

3,390 


3.8 

6.6 


139 


30,000 


322,428 


459 


30,000 


322,748 


337,907 

320,937 

345,325 

288,279 


4.05 

4.10 

776 

1,490 

30.000 

30,000 

313,982 

289,937 

308,284 

288,856 

326,323 

300,496 

272,322 

263,411 

5,121 

2,436 

3,096 

2,340 

364 

223 

721 

470 

7,376 


30,000 


329.665 


244,635 

228,819 

231,507 

221,544 


1,038 

1,388 

124 

211 


C-23 


4,00* 

4.50 


4.10 

4.05 

4.03 

4.01 


Figure  7  Speedups  for  PGRNET  and  TPGRNET  for  MAGEN  problems 


Figure  8  and  Tables  XII  and  XIII  show  results  for  two  problems  with  more  than  a 
million  variables.  The  problem  reported  in  Table  XII  has  tighter  capacity  constraints  than 
does  the  Table  XIII  problem.  Both  of  these  problems  are  small  grained,  so  they  can  be  solved 
quite  efficiently  by  PGRNET.  The  best  speedup  was  achieved  on  the  tightly  constrained 
problem  for  which  a  speedup  of  13  was  achieved  using  15  processors.  Note  that  the  tightly 
constrained  problem  was  considerably  more  difficult  to  solve  with  the  serial  version  of  the 
code,  so  that  there  was  more  potential  for  improvement  with  a  parallel  approach. 
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Figure  8  Speedups  for  PGRNET  (  #  arcs  >  1,000,000) 


Table  XII  PGRNET  results  for  tightly  constrained  problem 


program 

#  arcs 

#  nodes 

4^  qtrees 

time 

pivots 

maj  page  swap 

serial 

1,267,185 

30,000 

14,859 

8,415 

706,776 

914,478 

15  procs 

1,267,185 

30,000 

14,859 

660 

715,168 

101,866 

Table  XIII  PGRNET  results  for  loosely  constrained  problem 


program 

44  £Lrcs 

44  nodes 

44  qtrees 

time 

pivots 

maj  page  swap 

serial 

1,267,185 

30,000 

14,859 

3,305 

184,379 

363,755 

15  procs 

1,267,185 

30,000 

14,859 

490 

186,969 

45,672 
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4.  SUMMARY  AND  CONCLUSIONS 


The  availability  of  powerful  parallel  computers  has  generated  widespread  interest  in 
the  development  of  new  optimization  algorithms  for  such  machines.  The  parallel  algorithms 
and  software  reported  in  this  investigation  demonstrate  the  effectiveness  of  these  advanced 
computers  for  the  optimization  of  generalized  networks.  Although  these  methods  were 
developed  and  tested  for  a  particular  multiprocessor  system  with  shared  memory  (Sequent 
Symmetry  S81),  they  can  be  used  with  any  shared  memory  parallel  processing  system. 

In  our  empirical  study  we  found  that  our  serial  network  software  is  at  least  forty  times 
faster  than  MPSX  for  pure  network  problems  and  is  at  least  an  order  of  magnitude  faster 
than  MPSX  for  generalized  networks.  We  also  demonstrated  that  relaxing  the  restriction 
that  at  least  one  of  the  multipliers  associated  with  an  arc  be  +1  results  in  an  additional 
computational  expense  of  only  ten  percent. 

We  believe  that  the  best  current  serial  software  for  these  problems  is  GENNET  [5]  and 
GRNET2  [12]  and  we  began  our  study  of  parallel  algorithms  with  these  codes.  GENFLO  (a 
two-multiplier  version  of  GENNET)  and  GRNET2  provided  the  best  single  processor  times 
for  the  empirical  analysis  presented  in  this  study.  For  a  comparison  with  parallel  codes, 
both  codes  were  run  and  the  smaller  serial  time  was  used  in  speedup  calculations  for  the 
parallel  codes.  In  order  to  assess  the  effectiveness  of  our  paradlel  approaches,  we  considered 
generalized  networks  with  a  leirge  rtinge  of  connectedness  in  optimal  solutions,  since  this 
structure  was  a  key  factor  in  parallel  efficiency.  The  data-parallel  PGRNET  method  proved 
to  be  more  efficient  on  problems  with  many  quasi-trees  in  the  optimal  solution,  while  the 
control-parallel  TPGRNET  algorithm  was  more  effective  on  problems  with  relatively  few 
quasi-trees.  The  best  speedup  was  achieved  on  a  tightly  constrained  problem  having  30,000 
nodes  and  over  1.2  million  arcs.  For  this  problem  PGRNET  achieved  a  speedup  of  thirteen 
on  fifteen  processors.  Clearly,  the  lower  speedups  reported  here  for  problems  with  small 
numbers  of  quasi-trees  in  the  optimal  solutions  indicate  that  both  parallel  approaches 
encountered  difficulties  in  such  cases  in  terms  of  scale-up  to  larger  numbers  of  processors, 
in  the  sense  that  little  improvement  is  demonstrated  as  the  niunber  of  processors  increases 
beyond  10.  One  extension  that  we  are  currently  investigating  to  more  effectively  utilize 
additional  processors  in  a  control-parallel  approach  is  to  employ  them  as  “ratio  processors” 
that  perform  ratio  tests  (or  approximations  to  ratio  tests)  on  candidate  arcs  generated  by 
the  pivoting  processors.  Particidaxly  in  the  case  of  difficult  problems  in  which  the  number 
of  pivots  is  large  relative  to  the  number  of  nodes  and  arcs,  this  approach  holds  the  promise 
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of  increasing  speedups  by  significantly  reducing  the  total  number  of  pivots.  Another  future 
area  of  research  that  we  will  be  pursuing  is  the  investigation  of  the  extension  of  the  control- 
parallel  approach  to  general  linear  programs.  Clearly,  the  allocation  of  tasks  to  processors 
for  the  general  simplex  method  can  be  done  in  the  same  manner  as  TPGRNET.  Relevant 
issues  include  the  effect  of  increased  pricing  and  pivoting  times  for  general  linear  programs, 
alternatives  to  full  ratio  tests  in  the  case  that  ratio  processors  are  utilized,  and  the  effect 
of  increased  data  dependency  of  the  pricing  and  pivoting  tasks. 
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Computational  Comparison  of  Sequential  and  Parallel  Algorithms 
For  the  One-to-One  Shortest-Path  Problem 


Richard  V.  Helgason  f 
Jeffery  L.  Kennington  7 
B.  Douglas  Stewart  J 

Four  new  shortest-path  algorithms,  two  sequential  and  two  parallel,  for  the  source  to  sink  shortest- 
path  problem  are  presented  and  empirically  compared  with  five  algorithms  previously  discussed 
in  the  literature.  The  new  algorithm,  S22,  combines  the  highly  effective  data  structure  of  the 
S2  algorithm  of  Dial,  Glover,  Karney,  and  Klingman,  with  the  idea  of  simultaneously  building 
shortest-path  trees  from  both  source  and  sink  nodes,  and  was  found  to  be  the  fastest  sequential 
shortest-path  algorithm.  The  new  parallel  algorithm,  PS22,  is  based  on  S22  and  is  the  best  of  the 
parallel  algorithms.  We  also  present  results  for  three  new  S22-type  shortest-path  heuristics.  These 
heuristics  find  very  good  (often  optimal)  paths  much  faster  than  the  best  shortest-path  algorithm. 


Since  the  late  fifties  when  its  first  solution  methods  were  developed,  the  shortest-path  problem  has 
become  one  of  the  fundamental  problems  in  the  areas  of  combinatorial  optimization,  computer  science,  and 
operations  research.  Algorithms  and  applications  are  commonly  found  in  the  important  books  in  these  areas 
(see  for  e.^ample  Berge  and  Ghouila-Houri  (1962),  Bertsekas  and  Gallager  (1987),  Even  (1979),  Hu  (1982), 
Jensen  and  Barnes  (1980),  Lawler  (1987),  Papadimitriou  and  Steiglitz  (1987),  Quinn  (198*1),  Rockafellar 
(1984),  and  Tarjan  (1983)).  The  study  of  this  problem  has  been  motwated  by  both  its  elegant  mathematical 
structure  and  its  many  practical  applications.  Our  recent  interest  in  this  problem  was  occasioned  by  the 
need  to  solve  shortest-path  subproblems  in  several  mathematical  optimization  procedures  we  are  developing 
in  an  MIMD  parallel  computing  environment. 

E.xcellent  surveys  of  the  many  shortest-path  problem  variations  may  be  found  in  Deo  and  Pang  (1984) 
and  Gallo  and  Pallottino  (1986).  A  survey  of  techniques  and  computational  comparisons  may  be  found  in 
Gallo  and  Pallottino  (1988),  in  Dial,  Glover,  Karney,  and  Klingman  (1977)  and  (1979),  in  Klingman.  Mote, 
and  Whitman  (1978),  in  Glover,  Glover,  and  Klingman  (1984),  in  Desrochers  (1987),  and  in  Divoky  (1987). 
The  methods  are  grouped  into  two  general  classes;  laiel-stitinp  algorithms  and  labtl-correciing  algorithms. 
Dijkstra  (1959)  is  credited  with  the  first  label-setting  algorithm  and  any  algorithm  that  uses  this  approach 
has  been  considered  a  particular  implementation  of  Dijkstra’s  original  algorithm  (see  Gallo  and  Pallottino 
(1986)). 

Typically,  the  shortest-path  problem  is  one  that  requires  the  shortest-path  from  a  single  source  node,  say 
5,  to  all  other  nodes  in  a  network.  The  solution  can  be  represented  as  a  shortest-path  tree  rooted  at  s.  In  this 
paper  we  are  concerned  with  the  problem  of  finding  the  shortest-path  between  a  source  node  and  a  sink  node, 
t.  Dantzig  (1960)  suggested  that  a  pair  of  trees  be  built  with  one  rooted  at  s  and  the  other  rooted  at  t.  No 
stopping  criteria  were  given.  This  strategy  also  appears  in  the  book  by  Berge  and  Ghouila-Houri  (1962)  with 
an  incorrect  stopping  criterion.  Nicholson  (1966)  was  the  f  rst  to  present  a  correct  analysis  of  the  Dijkstra 
two-tree  algorithm.  Hart,  Nilsson,  and  Raphael  (1968)  presented  a  one-tree  algorithm  utilizing  heuristic 
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cost  functions.  They  included  the  case  in  which  lower  bounds  on  distances  between  nodes  are  available  and 
proved  that  optimal  paths  are  obtained.  Pohl  (1971)  extended  these  results  for  the  bi-directional  algorithm, 
antedating  Mohr  and  Pasche  (1988),  who  presented  similar  results.  Additional  discussion  may  be  found  in 
the  survey  by  Dreyfus  (1969),  where  he  conjectured  that  building  trees  from  s  and  t  would  be  ineffective  for 
this  problem. 

Few  empirical  studies  have  been  reported  in  the  literature  for  the  two-tree  algorithms.  Pohl  (1969) 
presented  results  that  have  received  little  recognition,  and  no  further  studies  appear  until  recently.  Helgasop, 
Kennington,  and  Stewart  (1988)  solved  shortest-path  problems  on  twelve  NETGEN  networks  and  twelve 
dense  bipartite  graphs  using  a  two-tree  Dijkstra  algorithm.  Mohr  and  Pasche  (1988)  solved  for  shortest-patns 
in  three  grid  networks  and  a  network  representing  a  road  map  of  Switzerland,  using  the  two-tree  Dijkstra 
algorithm  as  well  as  their  version  of  the  two-tree  algorithm  for  networks  with  lower  bounds  available.  They 
also  simulated  results  for  parallel  algorithms  if  two  processors  were  available. 

The  motivation  for  this  study  was  to  implement  two-tree  algorithms  using  more  efficient  data  structures 
than  the  classical  Dijkstra  algorithm,  and  to  actually  solve  shortest-path  problems  using  two  processors. 
Four  new  algorithms  were  developed:  sequential  and  parallel  two-tree  algorithms  based  on  Dial  (1969)  (the 
SI  code  in  Dial  et  al.  (1979)),  and  sequential  and  parallel  two-tree  algorithms  based  on  the  S2  code  in  Dial 
et  al.  (1979).  These  codes  are  compared  with  the  classical  Dijkstra,  SI,  S2,  two-tree  Dijkstra.  and  parallel 
two-tree  Dijkstra  codes. 

While  performing  this  study,  it  was  noted  that  the  two-tree  S2  algorithm  finds  “good”  paths  quickly. 
Three  heuristic  path  algorithms  were  developed  by  simply  stopping  early  while  executing  the  two-tree  S2 
algorithm.  Often  times  these  heuristics  find  a  shortest-path,  although  they  do  not  prove  it  is  so,  and  these 
paths  are  found  much  faster  than  the  fastest  optimal  algorithm.  We  present  computation  times  as  well  as 
measures  of  closeness  to  optimality  for  these  heuristics. 

1.  THE  ALGORITHMS 

This  section  presents  the  definitions,  notation,  and  nine  algorithms  representing  the  codes  used  in  our 
computational  study.  The  notation  and  presentation  are  based  on  that  found  in  Gallo  and  Pallattino  (1986). 
The  input  for  each  algorithm  is  a  directed  graph  G  =  with  a  node  set  N  and  an  arc  set  .4.  Associated 
with  each  arc  (i,j)  €  A  is  a  length  /,y.  A  shortest-path  is  desired  between  two  nodes  s  and  t  and  it  is 
assumed  that  such  a  path  exists.  We  also  assume  for  all  algorithms  that  lij  >  0  for  all  (i,j)  £  A.  For 
efficient  implementation  it  is  assumed  that  G  is  given  in  the  form  of  arc-lists.  Specifically,  the  forward  star 
for  node  u  £  N,  FS(u),  is  the  set  of  all  arcs  (u,j)  £  A.  Six  algorithms  require  a  backward  star,  BS(v), 
defined  as  the  set  of  al!  arcs  (t,  v)  £  A  for  node  v  £  N. 

The  basic  working  entities  of  each  algorithm  include  a  set  of  labels,  ,  for  node  distances  from  the  root 
of  a  shortest-path  tree  T,  a  set  of  predecessors,  pu,  for  nodes  in  the  tree,  and  the  set  of  candidate  nodes, 
Q.  In  the  algorithms  based  on  the  SI  and  S2  algorithms,  the  set  Q  will  be  divided  into  subsets,  or  buckets, 
that  will  contain  candidate  nodes  that  are  the  same  distance  from  the  root  node.  This  requires  that  each 


ate  length  be  integer  to  correspond  to  an  index.  For  ease  of  presentation,  we  will  let  F  be  the  set  of  indices, 
{0, ..., /mai}  for  the  buckets  of  Q,  where  Imax  =  max{/u„  :  (u,v)  6  A).  The  notation  will  be  modified  for 
the  bi-directional  algorithms  by  using  the  superscripts  s  or  i  to  indicate  the  root  of  the  tree.  The  algorithms 
terminate  with  the  length  of  the  shortest-path  from  s  to  t  and  a  small  set  of  nodes,  J,  used  to  identify  a 
shortest-path.  A  shortest-path  from  s  to  t  is  implicit  in  the  predecessor  labels. 

1.1  The  classical  Dijkstra  algorithm 

Dijkstra’s  classical  algorithm  (1959)  begins  at  node  s  and  builds  a  shortest-path  tree  T  in  which  the 
shortest-path  from  s  to  any  node  in  the  tree  is  known.  When  node  t  is  placed  in  the  tree  we  have  a  minimum 
length  directed  path  from  s  to  f  and  may  terminate.  As  mentioned  before,  the  algorithms  discussed  here 
differ  in  the  way  the  candidate  nodes  are  placed  in  and  retrieved  from  the  set  Q.  The  implementation  here 
for  the  classical  Dijkstra  algorithm  (Dl)  searches  the  list  of  candidate  nodes,  Q,  for  minimum  label  nodes 
and  places  ail  such  nodes  in  the  set  R.  Then  all  the  nodes  in  R  can  be  scanned  one  after  the  other.  When 
there  are  many  nodes  tied  with  the  minimum  label,  searches  are  avoided  with  only  minimal  effort.  The 
algorithm  may  be  stated  as  follows: 

Procedure  Dl(s,t) 

begin 

initialize: 

pi  «—  0,  d,  <—  oo  for  all  i  €  N\  Q  *-  0; 
d,  —  O.p,  ~s,i2— {s},T  — 0; 
while  t  ^  R 

for  each  u  €  /?  do 

for  each  (u,  v)  €  FS{u)  such  that  d^  -r  luv  <  dv  do 
dv  du  -1-  /uv  I 
Pv  —  «: 

if  V  ^  Q  then  Q  *—  Q  U  {u}; 

end 

T-Tu{u}; 

end 

comment:  search  Q  for  minimum  label  nodes  and  place  in  R 
a  <—  min{di  :  i  €  (?},  —  {» :  d,=a}, Q  —  Q\R; 

endwhile 
end 

1.2  The  SI  algorithm 

This  one-tree  algorithm  is  implemented  as  proposed  by  Dial  (1969).  As  in  algorithm  Dl,  it  selects  a 
minimum  label  node  to  scan  each  iteration,  but  the  nodes  in  Q  are  stored  in  buckets  according  to  their 
distance  labels.  More  specifically,  Imax  +  1  buckets  are  required,  where  Imax  =  max{/uti  :  («.  v)  €  A}  and 
a  node  «  is  stored  in  bucket  r  if  du  =  r(mod  Imax  +  1).  Only  nodes  with  equal  distance  labels  will  be  in 
bucket  z  and  only  Imax  +  1  buckets  are  required,  because  for  a  node  u  with  minimum  label,  we  have  that 
for  each  v  €Q,dv  <d„  <  d„  -f  Imai.  The  buckets  are  implemented  efficiently  as  two-way  linked-lists.  IVom 
minimum  label  node  u,  the  next  non-empty  bucket  contains  the  next  minimum  label  node(s)  and  the  effort 
to  search  an  entire  list  as  in  algorithm  Dl  is  greatly  reduced.  The  trade-off  is  increased  effort  managing 
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the  two-way  linked-list  each  time  a  node  has  an  improved  distance  label.  The  algorithm  may  be  stated  as 
follows: 

Procedure  Sl{s,t) 

I  begin 

initialize: 

Pi  •—  0,dj  ♦—  oo  for  all  i  E  N,  Qt  *—  <b  for  z  = 

Qo  * —  {^}i  OiPj  '  s,  T" «—  0; 

while  t 

let  z  be  the  next  index  such  that  Q,  ■, 

.  for  each  u  E  Qz  do 

for  each  (u,  v)  E  FS{u)  such  that  d„  +  /„«  <  d«  do 
a  *—  dv(mod  Imax  +  1); 
dv  «—  d„  +  /u«; 

6  *—  d„(mod  Imax  +  1); 

Pv  *—  u; 

)  Qa  —  Qa\{v}', 

Qb  ^<34U{v}; 

end 

T-Tu{n}; 

end 

endwhile 

^  end 

1.3  The  S2  algorithm 

This  one-tree  algorithm  is  based  on  an  idea  due  to  Dantzig  (1960)  and  the  implementation  here  is  due 
to  Dial  et  al.  (1977).  It  requires  that  each  FS{u)  for  Zk&uE  N  be  sorted  in  shortest  first  order.  Given  this, 
the  observation  can  be  made  that  the  entire  forward  star  of  a  node  u  need  not  be  scanned  all  at  once  in  that 
the  node,  say  v,  that  is  first  updated  will  have  a  distance  label  less  than  or  equal  to  any  subsequent  nodes 
updated  from  node  u.  The  node  v  is  placed  on  a  one-way  linked-list,  paired  with  u,  at  a  level  -i-  4,.  In 
stating  the  algorithm  below,  we  use  k{u)  as  a  counter  to  point  to  the  next  arc  in  FS{u)  to  be  scanned.  The 
^  node  u;fc(u)  is  the  corresponding  node  of  this  arc.  The  length  of  each  forward  arc-list  is  given  by  A(u).  It 

should  be  noted  that  the  actual  implementation  does  not  use  a  counter.  A  simple  minus  sign  in  the  forward 
star  array  can  indicate  the  next  arc  to  be  scanned. 

Nodes  may  be  duplicated  on  the  linked-list,  therefore  no  deleting  is  required.  Even  so,  the  number  of 
1  nodes  on  the  linked-list  will  not  e.\ceed  the  total  number  of  nodes  since  only  one  per  scanned  node  is  allowed. 

Q  is  searched  as  in  algorithm  SI  for  the  next  non-empty  bucket  and  minimum  label  node.  If  the  node’s 
paired  predecessor  is  not  its  current  predecessor,  the  node  is  already  in  the  shortest-path  tree  and  may  be 
discarded.  The  algorithm  may  be  stated  as  follows  (see  Gallo  and  Pallottino  (1986)): 
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Procedure  S2(s,<) 

begin 

initialize: 

Pi  *-Q,di  ^  oo,k{i) «—  l,h{i)  =  |/’S(»)|  for  all  i  G  iV; 

0  for  r  =  1, lmax\  Qq  *-  {s},  d,  —  0,p,  —  s,T 
while  t  §?  r 

let  z  be  the  next  index  such  that  Q,  ^  0; 
for  each  u£Q,  do 

conunent:  determine  next  node  in  FS(u) 

V  ^  ti'k(u): 

g,  -  gA{«}: 

comment:  determine  new  candidate  arc 
INSERT(u); 

r  — Tu{u}: 

if  u  ^  g  then  E^SERT(v); 
end 

endwhile 

end 

Procedure  INSERT(z) 
begin 

k{x)  k(x)  +  1; 

y  ijuicr); 

while  k(x)  <  h(x)  and  dx  +  Ixy  >  dy  do 
k{x)  —  k{x)  +  1; 

y 

endwhile 
if  k(x)  <  h{x)  then 
Py  *—  x; 
dy  dx  +  Ixy ; 
a  dy  (mod  Imax  +  1); 

Qa  ~~  Qa  U  {®}i 

endif 


1.4  The  two-tree  Dijkstra  algorithm 

The  two-tree  Dijkstra  algorithm  builds  a  pair  of  shortest-path  trees,  one  from  s  and  o.ie  from  t.  The 
tree  rooted  at  t  is  analogous  to  the  one  rooted  at  s,  but  scans  backward  stars,  Euid  its  predecessors  are  the 
heads  of  arcs  rather  than  tails.  The  two  trees  are  grown  in  alternate  steps  and  t'';mination  is  triggered 
when  a  node  appears  in  both  trees.  It  should  be  noted  however  that  the  node  appearing  in  both  trees  is  not 
necessarily  on  a  shortest-path  (see  Nicholson  (1966)  for  a  proof  of  termination  criteria).  Testing  on  ramdom 
graphs  showed  that  about  72%  of  the  time  this  node  will  be  on  the  shortest-path.  A  search  is  performed 
over  nodes  in  both  trees  to  find  a  node,  say  r,  such  that  d*  -b  d^  is  a  minimum.  A  shortest-path  from  s  to  t 
may  be  found  by  following  predecessors  from  r  to  s  amd  from  r  to  t  in  each  tree,  respectively.  We  define  J 
as  the  set  of  nodes  which  can  be  used  to  identify  a  shortest-path.  The  algorithm  requires  twice  the  storage 
of  algorithm  D1  and  may  be  stated  as  follows: 
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Procedtire  D2(s,t) 
begin 

initialize: 

I  Pi  0,  dj  ♦-  oo  for  all  i  €  //;  0; 

d;-O,p;-s.r*-0,i2*-{5}; 
p|  ^  0,d|  -  oo  for  Mi€N-,Q'  ^  0; 
d{-O,p|-t,r-0,i2‘-{t}; 
whUe  T'nr*  =  0  do 
for  each  u  €  iZ*  do 

for  each  (ri.v)  €  FS{u)  such  that  dj  +/„«  <  dt  do 
^  dJ-di  +  /„„; 

Pi  — 

if  V  ^Q*  then  Q‘  •— Q‘  U  {u}; 
end 

T*  ^T*U{u}; 
end 

I  comment:  search  Q‘  for  minimum  label  nodes  and  place  in  R* 

a^min{dj  :  i  €  Q"}.  iZ*  -  ;  dJ=o},Q*  ^ 

for  each  v  €  R'  do 

for  each  (u,u)  G  BS{v)  such  that  d|,  +  /„„  <  dj,  do 

dy  *“  d|,  -p  /yy  , 

pI  «-  «; 

if  V  ^  Q'  then  g'  —  g‘  U  {u}; 

*  end 

V  — r*U{t/}; 
end 

comment:  search  g‘  for  minimum  label  nodes  and  place  in  iZ* 
a  ^  min{d|  :  i  €  g‘},  iZ'  —  {t  :  dj=a},g‘  ^  Q‘\R*; 
endwhile 

I  comment:  stopping  criterion  met 

.5*-min{df +  d|  :  ieT’ur}; 

J-{i€T*ur  :dj  +  d|=;3}; 
end 


^  1.5  The  two-tree  Si  algorithm 

The  two-tree  SI  algorithm  builds  a  pair  of  shortest-path  trees,  one  from  s  and  one  from  t,  using  SI  data 
structures  for  each  tree.  As  in  algorithm  D2,  termination  is  triggered  when  a  node  is  first  found  to  be  in 
both  trees.  The  node  r  such  that  d*  -(-  d^  is  minimum  gives  the  shortest-distance  from  s  to  t  and  lies  on  a 
^  shortest-path.  The  algorithm  requires  twice  the  storage  of  SI  and  may  be  stated  as  follows: 

Procedure  S12(s,t) 
begin 

initialize: 

Pj  «—  0,  d*  •—  oo  for  all  i  €  N;  gj  «—  0  for  z  =  1, ...,  Imax; 

,  g5*-W,rf;-o,p;-s,r*-0; 

>  p|  ♦—  0,d|  ♦—  00  for  all  1  €  JV;  gj  «—  0  for  z  =  l,...,/moz; 
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while  T*  n  T*  =  0  do 

let  z  be  the  next  index  such  that  Q*  ; 
for  each  u  E  Q‘  do 

Q*  _  QJ\{w}; 

for  each  (u,t;)  €  •f’5(u)  such  that  dj  +  <  d*  do 

a  «—  d"(mod  /max  +  1); 
dl  *—dl+  /„* ; 
b  ♦—  (l'(inod  /max  +  1); 

pI  *-«: 

Ql  -  u  {«}; 

end 

T'-T'U{u}; 

end 

let  2  be  the  next  index  such  that  Q'  #  0  ; 
for  each  v  eQ\  do 

Q\*~Q\\{v}\ 

for  each  (u,  v)  €  BS{v)  such  that  d{,  +  /„«  <  d{,  do 
a  —  (i[,(mod  /max  +  1); 
d‘u  dj,  +  ; 

b  <—  d^(mod  /max  -f  1); 

pI  v; 

Q‘a-Q!,\{u}; 

Ql^QlU{u}; 

end 

r  >-T‘U{i;}; 

end 

endwhile 

comment:  stopping  criterion  met 

/?^min{dj +  d|  .-ier’ur}; 

;df +  d|=;3}; 

end 


1.6  The  two*tree  S2  algorithm 

As  in  the  previous  two-tree  algorithms,  the  two-tree  S2  algorithm  uses  mirror  S2  data  structures  to  build 
two  shortest-path  trees.  At  first  one  would  expect  the  stopping  procedure  to  be  the  same  as  in  the  previous 
two-tree  algorithms,  namely,  when  a  node  is  in  both  trees,  find  the  minimum  d’  +  dj  for  all  i  €  T*  U  T*. 
However,  at  the  time  a  node  is  first  placed  in  its  second  tree,  we  are  not  quite  ready  to  search  for  such  a 
minimum  doubly  labelled  node.  In  Nichobon  (1966)  such  a  node  b  proven  to  be  on  a  shortest-path  because 
each  node  in  both  trees  has  had  its  arc-lbt  fully  scsmned.  In  the  S2  implementation  for  each  tree  this  is 
not  the  case.  In  fact,  this  b  the  advantage  of  the  one-tree  S2  algorithm.  In  two  directions  we  must  perform 
additional  scanning  to  meet  the  Nicholson  criterion,  however,  we  will  not  need  to  manage  the  linked-lists. 
All  that  b  needed  b  to  update  dbtance  labeb  and  predecessors.  Actually  we  need  only  scan  a  subset  of  arcs 
from  each  arc-lbt.  Nicholson  proves  that  any  node  that  b  not  in  either  tree  when  a  node  is  first  in  both  trees 
will  not  be  on  a  shortest-path.  If  arcs  were  scanned  to  these  nodes,  they  would  still  not  be  in  either  tree  or 
on  a  shortest-path,  so  updating  then  dbtance  labeb  would  be  wasted  effort.  These  nodes  also  make  up  the 
majority  of  the  arc-lbts.  The  only  arcs  that  need  to  be  scanned  during  the  “mop-up”  phase  are  those  arcs 
that  have  from  nodes  in  one  tree  and  to  nodes  in  the  other  tree.  Since  these  arcs  may  be  scanned  from  either 
node,  it  b  more  efficient  to  consider  these  arcs  from  the  nodes  in  the  smaller  tree.  After  updating  dbtance 
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labels  and  predecessors  we  are  ready  to  seuch  for  the  minimum  doubly  labelled  node  to  find  our  minimum 
distance  from  s  to  f  and  a  shortest-path.  The  algorithm  may  be  stated  as  follows: 

Procedure  S22(s,t) 
begin 

initialize: 

pl  -  0,df  ^  oo,/k(i)  ^  l,/h(i)  =  |FS(t)|  for  aU  i  6  N; 

Ql—dfot  z  =  1, Imax;  Qi  {s],  d‘  0,p’  ^s,T’  0; 

p|  «—  0,d|  *—  OQ,  bk{i)  »-  l,bh{i)  =  |BS(»)|  for  ^  t  6  iV; 

Q*  —  0  for  :  =  1, Imax;  {t},  d}  ^  O.pj  *- 1,  T*  ^  0; 
while  T'  n  r‘  =  0  do 

let  z  be  the  next  index  such  that  Qt  ; 
for  each  u  £  Q‘  do 

comment:  determine  next  node  in  FS{u) 

V  w>/t(u); 

comment:  determine  new  candidate  arc 
SINSERT(u); 

T'-T*U{u}; 

if  u  (3*  then  SINSERT(n); 
end 

let  z  be  the  next  index  such  that  Ql  ^  S  ; 
for  each  v  E  do 

comment:  determine  next  node  in  BS{u) 

comment:  determine  new  candidate  arc 
TINSERr(v); 

r  -  r  uM; 

if  u  ft  <3‘  then  TINSERT(u); 
end 

endwhile 

comment:  mop-up  phase  from  smaller  tree 
if  ir*|  <  IT*!  then 
for  each  u  €  T’  do 

for  each  (u,y)  €  FS{u)  do 
if  y  €  then 

if  d’  -t-  /„y  <  dy  then 

Pi  w; 

endif 

endif 

end 

end 

else 

for  each  v  6  T*  do 

for  each  {w,v)  €  BS(v)  do 
if  in  €  T*  then 

if  dj  -I-  <  d^  then 

di,  •“  dj  -t"  fiBv  i 
pi,  *-v; 
endif 
endif 
end 
end 
endif 
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comment:  stopping  criterion  met 
0^mm{dt  :i€r*ur'}: 
J-{«€r*UT»  :dl+dl=0}; 
end 

Procedure  SINSERT(i) 
begin 

fk{x)  ^  fk{x)  +  1; 
y  ^  y>Jk(xy, 


while  fk(x)  <  fh(x)  and  dj  +  l^y  >  dj  do 


fk{x)  -  fkix)  +  1; 

y  —  «'/i(r): 

endwhile 


/^(*)  <  /^(*)  then 
Pj  ^  x; 
i-dl  +  l^y-. 


a  *—  dy(mod  Imax  +  1); 

q;-(3;u{x}; 

endif 

end 

Procedure  TINSERT(x) 
begin 

bk{x)  -  bk{x)  +  1; 

y  mk(x)\ 


while  bk{x)  <  bh(x)  and  d^  +  >  dj  do 

bk{x)  <—  bk{x)  +  1; 

endwhile 

if  bk{x)  <  bh(x)  then 
Py 

dj  »-  dj  +/yx: 

a  <—  dJ(mod  Imax  +  1); 

Q!.-Q‘oU{x}; 

endif 

end 


1.7  The  parallel  two>tree  Dijkstra  algorithm 


It  is  readily  observed  that  in  the  two-tree  shortest-path  algorithms,  the  trees  are  independent  of  each 
other.  The  only  requirement  is  to  check  whether  or  not  a  node  is  in  the  opposite  tree.  This  read-only  step 
causes  no  interference  using  multiple  processors.  This  leads  to  the  simplest  asynchronous  parallel  application 
using  two  processors,  one  for  each  tree.  When  one  processor  recognizes  that  a  node  is  in  both  trees,  it  sets  a 
flag  to  tell  the  other  processor  to  And  the  minimum  doubly  labelled  node  in  its  tree  while  it  does  the  same. 
The  minimum  of  the  two  is  the  minimum  path  distance  from  s  to  t.  Again,  a  shortest-path  is  implicit  in 
the  predecessor  labels  beginning  with  the  minimum  doubly  labelled  node.  In  the  algorithms  below,  each 
processor  has  its  own  identifying  number  called  procid.  The  parallel  processing  construct  called  foTk(2) 
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indicates  that  two  processors  are  to  be  used  to  execute  the  sections  until  the  join  construct  is  reached.  The 
algorithm  may  be  stated  as  follows; 


Procedure  PD2(s,t) 

begin 

flag  0; 

conoment:  begin  use  of  two  processors 
fork(2) 

if  procid  =  1  then 
initialize: 

pj  «—  0,  d-  *—  00  for  all  t  6  jV;  0; 

—  o.p5  *-  *-  ^  {^ji 

synchronize  processors 
while  flag  =  0  do 
for  each  u£  R‘  do 

for  each  {u,v)  6  FS(u)  such  that  d’  +luv  <  d^  do 
dj  *“  dy  +  /m  I 

pj  ^  u; 

if  u  ^  Q'  then  Q‘  *—  Q‘  U  {r}; 
end 

-T‘U  {u}; 

if  u  €  T‘  then  flag  1; 
end 

comment:  search  Q"  for  minimum  label  nodes  and  place  in  R‘ 
a  ^  min{df  ;  i  €  Q‘},R’  ^  {«  :  d\=Q},Q‘  ^  Q‘\R^; 
endwhile 

/3,  <— min{dj  +  d|  :  i  6  T‘}; 
end^ 

if  procid  =  2  then 
initialize: 

p|  *-  0,  d|  •—  oo  for  all  i  €  iV;  Q‘  *-  0; 
di-O,  Pi*- <,7^-0,  «'*-{<}; 
synchronize  processors 
while  flag  =  0  do 
for  each  u  6  do 

for  each  (u,u)  €  B5(v)  such  that  d*  +luv  <  dj,  do 
dL  ♦—  d^  +  /uv ; 


Pii 


,<  ^ 


V, 


if  n  €  <3*  then  Q*  <—  Q*  U  {u}; 
end 

r  ^r*u{v}; 

if  V  €  r*  then  flag  «—  1; 
end 

comment:  search  Q*  for  minimum  label  nodes  and  place  in  R* 
a  <—  min{d{  :  i  €  {» :  d*=a),Q*  *-  Q*\R*; 

endwhile 

min{dj  +  d|  :  i  €  T*}; 

endif 

comment:  end  use  of  multiple  processors 
join  processors 

/  ^  {»•  €  r*  ur  :  df  +  di=^}; 
end 
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1.8  The  parallel  two-tree  SI  algorithm 

The  parallel  two-tree  SI  algorithm  is  similar  to  the  parallel  two-tree  Dijkstra  algorithm.  Each  processor 
is  assigned  one  of  the  nodes,  s  or  t,  and  builds  a  tree  using  the  SI  data  structure.  When  a  node  is  found  to  be 
in  both  trees,  a  flag  is  set  to  tell  both  processors  to  find  the  minimum  doubly  labelled  node  in  its  respective 
tree.  The  minimum  of  these  two  gives  the  minimum  distance  path  from  s  to  t  with  a  path  implicit  in  the 
predecessor  labels.  The  algorithm  may  be  stated  as  follows; 

Procedure  PS12(s,t) 

begin 

flag  <-  0; 

comment:  begin  use  of  two  processors 
fork(2) 

if  procid  =  1  then 
initialize: 

p\  —  0,  d-  *—  oo  for  all »  €  jV;  QJ  <—  0  for  =  1, ...,  lmax\ 

synchronize  processors 
while  flag  =  0  do 

let  r  be  the  next  index  such  that  CJJ  5^  0  ; 
for  each  «  €  do 
0!-  - 

for  each  (u,u)  €  FS{u)  such  that  dj,  -f  l^v  <  dl  do 
a  <—  dJ(mod  Imax  +  1); 
dj  d*  -1-  /oo ; 

6  —  dJ(mod  Imax  +  1); 

Pi  — 

Qi'-QJUM; 

end 

-  T’  U  {«}; 

if  u  e  T'  then  flag  <—  1; 
end 

endwhile 

^  min{di -f  dj  :  i  €  r }; 

endif 

if  procid  =  2  then 
initialize: 

p\  —  0,  dj  —00  for  all  i  G  iV;  Qi  —  0  for  z  =  1, ...,  /moz; 

Qo  —  {<}.  —  o,p|  —  t,  r*  <-  0; 

synchronize  processors 
while  flag  =  0  do 

let  z  be  the  next  index  such  that  Qi  ^  0  ; 
for  each  v  €  Qi  do 

Q\  — 

for  each  (u,  v)  6  BS{v)  such  that  d^  -h  /„„  <  dj,  do 
a  «—  dJ,(mod  Imax  -I- 1); 
di  ‘  ^  d-  luv  i 
b  ♦—  d(,(mod  Imax  +  1); 

pI  v; 

end 

T*  -r'U{u}; 

if  V  e  r*  then  flag  1; 
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end 

endwhile 

0t  —  min{dj  +  dj  :  i  6  T'}; 
endif 

comment:  end  use  of  multiple  processors 
join  processors 
—  min{^,,/?,}: 

J^{i€T‘VT  :  d\  +  d\=0y, 
end 


1.9  The  parallel  two-tree  S2  algorithm 

As  in  the  previous  parallel  algorithms,  the  parallel  two-tree  S2  algorithm  assigns  one  processor  to  work 
on  the  tree  rooted  at  s  and  another  processor  to  work  on  the  tree  rooted  at  t.  When  a  node  first  appears  in 
both  trees,  a  flag  is  set  to  initiate  the  mop-up  node  scanning  phase.  As  explained  in  Section  1.6.  we  perform 
the  mop-up  scanning  phase  from  the  smaller  tree  only.  To  perform  this  work  with  two  processors  requires 
each  one  to  share  the  same  data,  namely,  distance  labels.  To  prevent  possible  interference  with  each  other, 
parallel  processing  constructs  called  locks  are  used  so  only  one  processor  at  a  time  may  update  a  distance 
label.  Following  this,  the  processors  are  synchronized  and  the  minimum  doubly  labelled  nodes  are  found  for 
each  tree.  The  minimum  of  these  two  gives  the  minimum  distance  path  from  s  to  t  and  a  shortest-path  is 
implicit  in  the  predecessor  labels.  The  procedures  SINSERT(i)  and  TINSERT(x)  are  as  presented  in  Section 
1.6.  The  algorithm  may  be  stated  as  follows: 

Procedure  PS22(s,t) 
begin 

flag  —  0; 

comment:  begin  use  of  two  processors 
fork(2) 

if  procid  —  1  then 
initialize: 

p\  Oydf  *-  oo,fk(i)  l,/h(j)  =  iF5(i)l  for  all  :  €  N; 

QJ  —  0  for  r  =  l,...,/max;  QJ  —  {s},  d’  —  O.pJ  —  s,  T’  <-  0; 
pj  •—  0,dj  oo,  (>k(i)  <—  1, 6/i(i)  =  |55(z)l  all »  €  jV; 

<3*  <—  0  for  z  =  1,  ...,lmax;  Qq  ^  {t},  dj  »-  0,p{  *-t,  T*  <—  0; 
synchronize  processors 
while  flag  =  0  do 

let  z  be  the  next  index  such  that  ^  0  ; 
for  each  u  €  QJ  do 

comment:  determine  next  node  in  FS{u) 

comment:  determine  new  candidate  arc 
SINSERT(u); 

T*  ^T*U{u}; 
if  u  ^  Q*  then  SINSERT(u); 
if  u  e  T*  then  flag  —  1; 
end 

endwhile 

endif 


if  procid  =  2  then 

pj  <—  *—  oo,6fc(i)  <—  l,6A(i)  =  |BS(»)|  for  all  i  G  iV; 

Q\  ^  0  for  ;  =  1,  ^  {<},  ^  0,p|  <-t,T 

synchronize  processors 
while  flag  =  0  do 

let  z  be  the  next  index  such  that  Q*,  H)  ; 
for  each  v  do 

comment:  determine  next  node  in  BS{u) 

“  ^  Wik(vy, 

Q‘,-Q\\{vh 

comment:  determine  new  candidate  arc 
TINSERT(i;)i 

r  ^  r  u  {v}: 

if  u  ^  Q‘  then  TINSERT(u); 
if  v  G  T'  then  flag  *—  1; 
end 

endwhile 

endif 

comment:  mop-up  phase  from  smaller  tree 
synchronize  processors 
if  |r*|  <  \T‘\  then 

comment:  each  processor  works  on  next  unscanned  node  u  G  T' 
for  each  u  G  T'  do 

for  each  (u,y)  G  FS(u)  do 
if  y  G  T*  U  then 
if  dj  +  luy  <  dy  then 
set  locked 

<  —<i‘u  +  ky-, 

Py  — 

set  unlocked 
endif 
endif 
end 
end 
else 

for  each  v  G  T*  do 

comment:  each  processor  works  on  next  unscanned  node  v  G  T* 
for  each  (w,v)  G  BS{v)  do 
if  u;  G  T*  U  T*  then 
if  dj,  +/„,«<  di,  then 
set  locked 

dl,  dl  +  luiv] 

pi,  *-  v; 

set  unlocked 
endif 
endif 
end 
end 
endif 

synchronize  processors 

if  procid  =  1,0,  ♦—  minidf  -h  dj  : »  G  T*}', 

if  procid  =  2,0,  *—  min{df  -H  dj  :  i  G  T‘}; 

join  processors 
0  ^min{0,,0,}; 


2.  COMPUTATIONAL  EXPERIENCE 

All  nine  algorithms  have  been  coded  in  FORTRAN  and  run  on  a  Sequent  Symmetry  S81  using  either 
one  or  two  Intel  80386  processors.  Several  factors  affect  the  performance  of  shortest-path  codes.  First,  the 
number  of  nodes  is  important  for  Dijkstra-type  algorithms,  whose  majority  of  work  is  searching  a  node¬ 
length  array  for  a  minimum  label  node.  Second,  the  average  degree  (lA|/|iV|)  is  important  because  the 
Dijkstra-type  and  Sl-type  algorithms  must  scan  entire  forward  (or  backward)  stars  each  iteration,  while 
S2-type  algorithms  typically  scan  only  a  subset.  Finally,  the  cost  range  of  a  network  affects  the  length  and 
sparseness  of  the  Q  array  for  Sl-type  and  S2-type  algorithms,  which  are  searched  each  iteration.  The  cost 
range  and  degree  also  affect  the  number  of  nodes  that  tie  with  a  minimum  distance  label,  thereby  reducing 
the  number  of  searches. 

Four  node  levels  (1000,  2000,  3000,  4000),  ten  average  degree  levels  (5,  10,  15,  20,  25,  -50,  75,  100,  125, 
150),  and  three  cost  ranges  (1-100,  1-1000,  1-10000)  were  chosen  as  being  varied  enough  to  demonstrate 
which  factors  were  influencing  performance.  The  total  number  of  random  networks  generated  was  120.  Each 
code  solved  twenty  problems  per  network  using  the  same  randomly  generated  s  and  t  nodes,  yielding  a 
total  of  2400  problems.  Each  data  point  in  Tables  1-3  is  the  sum  of  times  (in  seconds)  to  solve  the  twenty 
problems.  Since  the  S2-type  codes  require  sorted  forward  and  backward  stars,  all  codes  were  given  sorted 
forward  and  backward  stars  to  eliminate  this  as  a  relative  factor  among  them.  It  is  debatable  whether  or 
not  the  sorting  time  should  be  counted  against  the  S2-type  algorithms  since  it  negaires  sorted  arc-lists.  Here 
we  simply  assume  that  the  data  is  available  in  pre-sorted  order  eind  concede  that  if  it  were  not  available,  the 
S2-type  algorithms  would  not  be  appropriate. 

With  nine  codes  and  120  networks,  many  comparisons  and  observations  can  be  made.  We  highlight  the 
major  points  of  interest.  First,  there  is  some  overlap  with  previous  studies  and  we  wish  to  confirm  previous 
results.  As  in  Dial  et  al.  (1979),  we  find  that  Si  is  better  than  S2  on  only  the  smallest  degree  problems.  .A.s 
the  forward  stars  get  larger,  the  savings  of  scanning  only  a  subset  are  realized.  On  four  high  node  number, 
low  degree  networks,  Mohr  and  Pasche  (1988)  found  that  a  D2-type  algorithm  required  about  38%  of  the 
time  needed  by  a  D1  algorithm.  Here  we  found  that  the  averages  for  D2  were  65%,  49%,  and  23%  of  that 
for  D1  for  the  cost  ranges  1-100,  1-1000,  and  1-10000  respectively. 

PS22,  the  parallel  two-tree  S2  code,  is  the  overall  winner.  It  had  the  fastest  time  on  108  out  of  the 
120  networks,  while  PS  12  was  fastest  on  12  networks.  PS22  was  the  fastest  code  when  the  average  degree 
increased  above  10  on  networks  with  1-100  cost  range  and  when  the  average  degree  increased  above  5  on 
networks  with  1-1000  cost  range.  It  was  always  the  fastest  code  when  the  cost  range  was  1-10000.  PD2 
improved  on  the  networks  with  the  lowest  number  of  nodes,  lowest  cost,  and  highest  degree  —  all  factors 
that  reduce  the  number  of  and  time  for  searches  for  minimum  label  nodes  by  increasing  ties.  Were  the  degree 
increased  to  make  these  networks  much  more  dense,  PD2  might  become  more  competitive. 

Overall,  S22  was  the  fastest  sequential  code,  and  was  even  faster  than  PS12  and  PD2.  It  had  the  fastest 
time  of  the  sequential  codes  on  all  120  networks.  In  general,  the  time  required  using  two  Sl-type  trees  is 
about  23%,  26%,  and  36%  of  the  time  using  only  one  SI  tree  on  the  1-100,  1-1000,  and  1-10000  cost  ranges 
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respectively.  Similarly,  two  S2-type  trees  require  on  average  23%,  2T7o,  and  37%  of  the  time  required  by  the 
S2  code  on  1-100,  1-1000,  and  1-10000  cost  range  networks,  respectively. 

Dreyfus  (1969)  comments  that  savings  may  accrue  for  two-tree  algorithms  if  the  stopping  criterion  is 
reached  well  before  Nf'2  nodes  are  permanently  labelled  in  each  tree.  This  is  definitely  the  case  for  random 
networks.  The  one-tree  algorithms  scan  approximately  50.4%  of  the  nodes  in  the  network  until  t  is  placed 
in  the  tree,  while  the  two-tree  algorithms  scan  only  about  4.7%  of  the  nodes  until  one  is  first  placed  in  both 
trees.  That  is,  two-tree  algorithms  scan  about  9.3%  of  the  nodes  scanned  by  one-tree  algorithms,  resulting 
in  the  above  mentioned  savings  in  time. 

It  should  be  noted  that  we  also  solved  the  above  problems  using  the  efficient  label-correcting  code 
THRESH-X2  (see  Glover,  Klingman,  Phillips,  and  Schneider  (1985)).  The  total  times  in  seconds  for  the  1- 
100,  1-1000,  and  1-10000  cost  range  networks  were  877.9,  963.4,  and  976.0,  respectively.  Since  this  algorithm 
solves  the  one-to-all  shortest-path  problem,  it  was  not  included  in  the  tables.  We  can  see,  however,  that  it 
is  more  efficient  to  use  label-setting  algorithms  for  the  one-to-one  problem  since  they  stop  before  the  entire 
tree  is  built. 

When  using  two  processors,  D2  and  S12  parallelize  nicely  as  the  PD2  and  PS12  codes.  PD2  averages 
a  speed-up  of  1.93  over  D2.  PS12  averages  a  speed-up  of  1.93  over  its  sequential  counterpart.  S12.  In  fact, 
on  some  networks  a  speed-up  over  2.0  is  achieved.  This  is  due  to  ties  for  the  minimum  label  node.  The 
sequential  versions  scan  all  nodes  that  tie  in  one  tree  before  moving  to  the  other  tree.  With  two  processors, 
a  node  that  is  first  scanned  from  a  group  with  ties  could  be  the  one  that  is  placed  next  in  the  opposite  tree 
and  the  remaining  tied  nodes  need  not  be  scanned  as  they  are  in  the  one  processor  version.  Less  work  in 
parallel  results  in  a  speed-up  over  2.0.  If  the  sequential  versions  scanned  a  node  from  each  tree  alternately, 
the  speed-up  would  be  less  than  2.0,  but  overall  this  was  slightly  slower  than  scanning  all  nodes  that  tied. 

PS22  averages  only  a  speed-up  of  1.40  over  S22.  This  lower  speed-up  is  due  to  the  additional  cost  of 
using  the  parallel  processing  locks  during  the  relatively  lengthy  mop-up  phase.  That  is,  when  one  processor 
has  locked  a  section  of  code,  the  other  processor  waits  idly  until  the  section  becomes  unlocked  before  it  can 
execute  the  same  section. 

3.  SHORTEST-PATH  HEURISTICS 

There  are  times  when  it  may  be  of  interest  to  quickly  find  “good”  paths  in  a  network.  For  example, 
when  finding  a  starting  solution  to  a  network  flow  problem,  paths  may  be  found  to  send  flow  from  sources 
to  sinks  that  do  not  have  to  be  minimum  paths.  “Good”  paths  at  the  start  may  mean  the  minimum  cost 
flow  will  be  found  more  quickly.  We  find  there  is  a  trade-off  between  the  time  to  find  a  path  and  the  length 
of  the  path  relative  to  a  minimum  path. 

We  have  seen  in  Section  2  that  the  S22  code  is  the  overall  fastest  sequential  code  for  finding  a  shortest- 
path  between  two  nodes.  Recall  that  this  algorithm  requires  a  mop-up  phase  to  scan  all  remaining  unscanned 
arcs  in  the  forward  or  backward  stars  of  the  nodes  in  each  shortest-path  tree  before  a  shortest-path  can  be 
found.  This  mop-up  phase  has  been  found  to  take  from  3%  to  70%  of  the  total  time  for  low  degree  to  high 


degree  networks,  respectively.  However,  before  this  mop-up  phase  we  have  a  node  that  is  in  both  sbortest- 
path  trees  and  a  path  from  s  to  t  implicit  in  the  predecessor  labels.  It  may  not  be  an  optimal  path,  but  it 
is  likely  to  be  good  and  is  found  much  quicker  on  higher  degree  networks.  The  heuristic  code  H2  used  the 
path  implied  by  this  first  node  in  each  shortest-path  tree. 

Following  the  mop-up  phase  in  the  optimal  S22  code,  there  is  a  search  over  all  nodes  in  each  tree  for 
the  one  with  the  minimum  sum  of  its  distance  labels.  This  same  search  may  be  done  at  the  end  of  heuristic 
H2,  without  doing  the  mop-up  phase,  to  see  if  there  is  a  better  path  than  the  one  implied  by  the  first  node 
in  both  trees.  Heuristic  H3  is  identical  to  H2,  but  performs  this  additional  step. 

Paths  between  s  and  t  are  known  before  a  node  appears  in  both  shortest-path  trees.  Heuristic  code  HI 
uses  the  path  implied  by  the  node  that  first  has  a  finite  distance  label  from  both  s  and  t. 

Tables  4-6  show  the  times  for  twenty  problems  per  network  on  the  same  networks  used  in  testing  the 
optimal  algorithms.  Also  shown  is  the  percentage  this  time  was  of  the  optimal  S22  algorithm.  As  expected, 
substantial  savings  in  time  are  realized  on  high  degree  networks,  where  the  mop-up  phase  dominates  the  S22 
times.  On  average,  HI  requires  about  65%  of  S22  time,  while  H2  and  H3  require  73%  and  75%  respectively. 
However  on  average,  HI  ranges  from  81%  of  the  S22  time  on  the  lowest  degree  networks  to  46%  of  S22  time 
on  the  highest  degree  networks.  Similarly,  H2  has  an  average  range  of  93%  to  51%  of  S22  time  and  H3  has 
an  average  range  of  95%  to  52%  of  S22  time. 

Tables  7-9  show  how  good  these  paths  are  compared  to  the  actual  shortest-paths.  On  average,  HI  found 
the  shortest-path  55%  of  the  time.  The  average  length  of  its  path  was  8%  greater  than  the  shortest-path 
and  the  worst  path  found  averaged  44%  greater  than  the  shortest-path.  H2  found  72%  of  the  shortest- 
paths  (reaffirming  the  necessity  of  the  mop-up  phase).  Its  average  path  length  was  2.4%  greater  than  the 
shortest-path  and  the  worst  path  it  found  averaged  19%  greater  than  the  optimal  path.  H3  found  93%  of 
the  shortest-paths  with  an  average  path  length  0.5%  greater  than  the  shortest-path.  Its  worst  path  averaged 
5%  greater  than  the  shortest-path. 

It  should  be  noted  that  in  a  few  instances.  Hi  found  more  of  the  optimal  paths  for  a  given  network  than 
H2.  (See  Table  10,  nodes  =  1000  and  degree  =  5.)  This  is  similar  to  the  case  in  which  the  first  node  placed 
in  both  shortest-path  trees  is  not  necessarily  on  a  shortest-path.  Here,  the  node  that  first  has  two  finite 
labels  (HI)  is  on  a  shortest-path,  but  is  not  the  node  that  is  first  placed  in  both  trees  (H2).  See  Helgason  et 
al.  (1988)  for  an  example  that  demonstrates  these  cases. 

4.  CONCLUSION 

The  objective  of  this  paper  has  been  to  present  four  new  shortest-path  algorithms,  two  sequential  and 
two  parallel,  and  to  empirically  compare  them  with  five  algorithms  previously  discussed  in  the  literature. 
The  new  algorithms  combine  the  highly  effective  data  structures  of  the  SI  and  S2  algorithms  with  the  idea 
of  building  trees  from  a  source  node  and  a  sink  node  in  order  to  find  a  shortest-path.  We  found  that  the 
new  S22  algorithm  was  the  fastest  sequential  algorithm  on  all  networks.  The  new  parallel  algorithm,  PS22, 


was  the  fastest  algorithm  on  all  but  the  lowest  degree  networks,  where  PS12  was  the  fastest.  It  appears  that 
the  parallel  two-tree  Dijkstra  algorithm,  PD2,  might  be  competitive  only  on  very  low  cost,  dense  networks. 

The  secondary  topic  of  this  paper  is  heuristic  S22-type  algorithms  for  obtaining  near-minimum  paths. 
Three  new  heuristic  shortest-path  algorithms  were  discussed  and  were  shown  to  find  very  good  (often  optimal) 
paths  from  a  source  to  a  sink  much  faster  than  the  shortest-path  can  be  found.  These  heuristics  eliminate 
the  time-consuming  mop-up  phase  required  in  the  S22  algorithm  and  are  quite  effective  on  higher  degree 
networks. 


5.  APPENDIX 


Tables  1-3  present  the  computational  results  for  solving  twenty  problems  for  each  of  the  nine  shortest- 
path  algorithms  discussed  above.  Tables  4-6  present  the  computational  results  for  solving  twenty  problems 
for  the  three  S2-type  heuristics  discussed  above.  Tables  7-9  show  how  often  the  heuristics  found  the  optimal 
solution  and  how  far  off  the  solutions  were  when  they  did  not. 
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Table  1.  -  -  Time  in  seconds  for  20  problems  (Cost  range:  1-100) 


code 

nodes 

degree 

D1 

SI 

S2 

D2 

S12 

S22 

PD2 

PS12 

PS22 

5 

7.62 

0.79 

0.84 

6.22 

0.46 

0.46 

3.38 

0.35 

0.41 

10 

5.03 

1.30 

0.94 

5.06 

0.61 

0.54 

2.79 

0.42 

0.43 

15 

3.81 

1.59 

0.90 

3.86 

0.69 

0.48 

2.18 

0.44 

0.42 

20 

3.65 

2.06 

1.25 

3.64 

0.87 

0.55 

2.07 

0.53 

0.46 

1000 

25 

4.17 

3.06 

1.74 

3.52 

1.01 

0.57 

2.00 

0.61 

0.48 

50 

4.03 

4.57 

2.05 

2.50 

1.50 

0.65 

1.49 

0.82 

0.53 

75 

4.67 

6.21 

2.09 

2.27 

1.91 

0.73 

1.36 

1.05 

0.60 

100 

6.65 

9.00 

4.01 

2.50 

2.63 

0.89 

1.36 

1.30 

0.64 

125 

6.74 

9.87 

4.33 

2.10 

2.57 

0.93 

1.25 

1.36 

0.63 

150 

9.46 

13.72 

5.75 

2.53 

3.32 

1.11 

1.40 

1.70 

0.72 

5 

23.02 

2.28 

2.51 

18.50 

0.92 

0.89 

9.61 

0.59 

0.68 

10 

12.88 

3.23 

2.83 

12.98 

1.13 

0.93 

6.80 

0.69 

0.70 

15 

8.73 

3.43 

2.01 

9.67 

1.34 

mm 

5.27 

0.78 

0.72 

20 

9.63 

5.50 

3.32 

8.88 

1.55 

4.82 

0.83 

0.73 

2000 

25 

9.28 

6.55 

3.35 

8.36 

1.89 

4.57 

1.06 

0.78 

50 

10.05 

11.14 

5.31 

5.97 

2.87 

1.21 

3.33 

1.52 

0.92 

75 

12.56 

3.65 

4.69 

3.22 

1.23 

2.65 

1.59 

0.89 

100 

13.37 

18.91 

8.79 

4.35 

4.00 

1.41 

2.42 

2.18 

0.98 

125 

17.01 

23.21 

9.16 

4.53 

5.18 

1.52 

2.51 

2.45 

1.05 

150 

18.43 

27.33 

8.74 

4.81 

5.91 

1.79 

2.62 

2.89 

1.15 

5 

3.23 

3.44 

31.17 

1.30 

■Bl 

15.72 

0.92 

10 

5.00 

3.98 

19.70 

1.55 

Is 

10.38 

0.94 

15 

17.52 

6.86 

4.95 

17.94 

1.96 

m 

9.29 

1.00 

20 

14.28 

8.14 

5.26 

13.39 

2.22 

1.37 

7.13 

1.21 

0.99 

3000 

25 

14.05 

10.11 

5.96 

12.97 

2.82 

1.47 

6.99 

1.37 

1.08 

50 

11.71 

12.90 

4.56 

8.30 

3.37 

1.53 

4.60 

1.71 

1.11 

75 

14.73 

18.52 

5.32 

7.09 

4.44 

1.73 

4.04 

2.26 

1.23 

100 

20.02 

27.96 

9.09 

7.14 

5.94 

1.96 

3.99 

3.04 

1.28 

125 

22.83 

33.82 

12.85 

6.54 

6.93 

3.46 

3.33 

1.33 

150 

23.88 

34.91 

12.52 

5.91 

6.49 

3.33 

3.27 

1.32 

5 

48.97 

4.69 

4.83 

42.68 

1.70 

1.58 

22.13 

0.99 

1.17 

10 

27.81 

6.26 

5.42 

26.38 

2.00 

1.59 

13.73 

1.11 

1.18 

15 

22.66 

8.86 

6.05 

23.14 

2.54 

1.72 

12.77 

1.36 

1.27 

20 

20.28 

11.21 

7.03 

19.05 

2.68 

1.71 

10.11 

1.45 

1.28 

4000 

25 

21.14 

15.66 

9.17 

17.76 

3.32 

1.81 

9.39 

1.71 

1.36 

50 

18.55 

20.51 

8.52 

11.65 

4.49 

1.99 

6.46 

2.18 

1.40 

75 

24.84 

32.71 

14.88 

10.82 

6.69 

2.23 

5.78 

3.13 

1.55 

100 

27.68 

38.53 

13.72 

9.12 

6.84 

2.37 

5.03 

3.24 

1.56 

125 

30.99 

44.23 

13.37 

9.17 

8.66 

2.56 

4.93 

3.90 

1.69 

150 

30.05 

47.07 

10.90 

8.34 

9.63 

2.70 

4.44 

4.39 

1.76 

total 

655.56 

557.49 

235.39 

425.20 

129.15 

55.29 

227.58 

65.59 

39.34 

Table  2.  -  -  Time  in  seconds  for  20  problems  (Cost  range:  i-lOOO) 


code 

nodes 

degree 

D1 

SI 

S2 

D2 

S12 

S22 

5 

1.15 

0.78 

0.67 

4.62 

BBI 

0.54 

10 

20.47 

1.14 

9.82 

0.87 

0.63 

5.20 

mm 

0.53 

15 

1.45 

9.37 

0.98 

0.63 

4.97 

0.54 

20 

15.79 

1.38 

8.56 

1.11 

0.63 

4.55 

0.66 

0.55 

1000 

25 

14.87 

3.11 

1.64 

7.60 

1.15 

0.63 

4.07 

0.69 

0.55 

50 

10.59 

4.77 

1.99 

6.91 

1.83 

0.78 

3.76 

1.04 

0.64 

75 

10.74 

7.26 

2.73 

7.04 

2.50 

0.94 

3.71 

1.26 

0.69 

100 

10.54 

9.24 

2.95 

6.12 

2.91 

1.03 

3.33 

1.54 

0.79 

125 

11.78 

11.68 

4.75 

6.22 

3.58 

1.23 

3.36 

1.80 

0.89 

150 

9.90 

10.46 

3.21 

5.07 

3.31 

1.12 

2.76 

1.72 

0.79 

5 

93.53 

2.19 

27.70 

1.21 

Wm 

14.14 

0.75 

0.80 

10 

64.37 

3.43 

24.63 

1.32 

119 

12.65 

0.80 

0.79 

, 

15 

50.13 

4.35 

3.12 

23.35 

1.53 

in 

12.18 

0.S9 

mm 

20 

41.88 

5.34 

3.31 

20.52 

1.72 

119 

mSm 

0.94 

0.79  1 

2000 

25 

42.08 

7.38 

4.78 

21.40 

2.07 

In 

US 

0.84 

50 

28.76 

12.52 

5.96 

19.29 

3.35 

9.87 

1.70 

0.99 

75 

22.46 

14.42 

6.12 

14.36 

3.85 

Wm 

1.03 

100 

23.51 

19.21 

4.62 

15.13 

5.39 

1.74 

7.92 

2.56 

1.22 

125 

21.25 

20.12 

561 

11.78 

5.02 

1.68 

6.25 

2.55 

1.14 

150 

27.28 

28.97 

8.25 

12.22 

6.68 

6.48 

3.13 

1.29  ! 

5 

157.41 

3.44 

52.47 

1.67 

1.44 

mm 

1.09  1 

10 

113.51 

5.83 

44.69 

1.88 

1.42 

B9 

1.03  1 

15 

84.83 

6.93 

4.14 

43.59 

2.14 

1.40 

21.86 

mm 

1.07 

20 

65.96 

8.14 

4.81 

42.00 

2.64 

1.52 

21.12 

1.37 

1.13 

3000 

25 

63.72 

10.61 

5.97 

39.18 

2.83 

1.50 

19.74 

1.46 

1.13 

50 

39.78 

15.73 

7.43 

25.20 

3.84 

1.63 

13.08 

1.88 

1.22 

75 

41.44 

26.88 

11.87 

27.83 

5.98 

2.06 

14.06 

2.85 

1.38 

100 

34.30 

26.80 

9.63 

20.02 

0.99 

1.98 

10.48 

2.79 

1.36 

125 

34.88 

32.14 

13.63 

18.56 

6.55 

2.05 

9.72 

3.15 

1.39 

150 

38.17 

39.71 

13.01 

17.74 

7.94 

2.26 

9.30 

3.67 

1.42 

5 

269.68 

4.98 

78.45 

2.03 

1.81 

39.54 

1.17 

1..32 

10 

172.27 

8.38 

81.07 

2.51 

1.83 

40.44 

1.37 

1.35 

15 

105.57 

7.47 

4.21 

52.22 

2.40 

1.67 

26.15 

1.33 

1.26 

20 

100.07 

11.86 

6.64 

62.87 

3.17 

1.85 

31.88 

1.64 

1.36 

4000 

25 

81.74 

12.56 

6.36 

56.04 

3.36 

1.89 

28.03 

1.76 

1.41 

50 

54.62 

21.43 

7.01 

40.52 

4.94 

2.05 

20.58 

2.39 

1.48 

75 

51.40 

31.54 

8.64 

36.33 

6.52 

2.36 

18.53 

3.26 

1.65 

100 

51.97 

43.10 

13.61 

34.58 

9.74 

2.93 

17.88 

4.75 

1.81 

125 

39.31 

33.45 

7.67 

26.70 

8.98 

2.84 

13.79 

4.43 

1.84 

150 

51.74 

50.83 

15.91 

24.54 

10.33 

2.99 

12.53 

4.82 

1.93 

total 

2223.33 

573.53 

230.03 

1090.39 

146.60 

61.19 

558.11 

74.04 

42.72 
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Table  3.  -  -  Time  in  seconds  for  20  problems  (Cost  range;  1-10000) 


code 

nodes 

degree 

D1 

SI 

S2 

D2 

S12 

S22 

PD2 

PS12 

PS22 

5 

51.21 

3.51 

11.88 

3.86 

2.57 

6.24 

2.25 

1.97 

10 

42.60 

2.77 

3.01 

1.81 

4.80 

1.81 

1.49 

15 

38.74 

2.46 

10.66 

2.97 

1.63 

5.62 

1.77 

1.39 

20 

40.55 

3.96 

2.58 

10.70 

2.92 

1.52 

5.62 

1.75 

1.32 

1000 

25 

43.22 

4.59 

2.74 

9.99 

2.99 

1.53 

5.30 

1.78 

1.30 

50 

30.62 

5.83 

2.53 

9.21 

3.34 

1.42 

4.88 

1.95 

1.24 

75 

34.97 

9.09 

4.00 

11.77 

4.31 

1.64 

6.16 

2.34 

1.37 

100 

28.98 

10.11 

3.87 

9.32 

4.28 

1.63 

4.93 

Ha 

1.36 

125 

24.48 

10.66 

3.02 

9.21 

4.72 

1.70 

4.87 

HQ 

1.40 

150 

25.63 

12.82 

3.79 

8.10 

4.79 

1.66 

4.33 

1.39 

5 

170.47 

5.02 

4.63 

30.39 

4.32 

2.99 

15.70 

2.47 

2.26 

10 

152.29 

5.32 

4.31 

25.82 

3.57 

2.27 

13.52 

2.11 

1.77 

15 

150.98 

6.23 

4.22 

31.87 

3.70 

2.11 

16.38 

2.13 

1.68 

20 

129.11 

6.69 

4.15 

33.01 

3.88 

2.03 

16.81 

2.21 

1.67 

2000 

25 

109.28 

6.65 

4.19 

26.79 

3.73 

1.90 

13.92 

2.15 

1.57 

50 

101.63 

12.15 

5.21 

28.23 

4.87 

2.01 

14.45 

2.64 

1.60 

75 

90.39 

17.34 

6.67 

28.80 

5.86 

2.13 

3.15 

1.72 

100 

66.55 

17.62 

5.15 

27.10 

6.51 

2.27 

14.11 

3.43 

1.69 

125 

62.73 

21.42 

5.41 

28.49 

7.54 

2.50 

14.11 

3.86 

1.89 

150 

65.88 

28.84 

8.73 

27.35 

8.46 

2.69 

14.11 

4.32 

2.01  ! 

5 

380.23 

6.43 

6.07 

53.09 

2.54 

10 

330.84 

7.48 

5.82 

49.91 

25.16 

2.05 

15 

313.76 

9.13 

6.07 

51.80 

26.47 

2.40 

1.95  ! 

20 

309.35 

11.13 

7.24 

50.05 

4.33 

25.10 

2.47 

1.89 

3000 

25 

213.96 

10.24 

5.24 

47.35 

4.42 

2.26 

24.08 

2.50 

1.86 

50 

165.98 

16.37 

6.01 

52.71 

6.07 

2.43 

26.91 

3.25 

1.93 

75 

155.82 

26.33 

13.15 

59.12 

8.16 

2.82 

30.14 

4.29 

2.12 

100 

127.05 

29.77 

12.26 

51.31 

8.86 

2.92 

26.22 

4.57 

2.16 

125 

81.45 

24.72 

7.66 

35.00 

7.83 

2.54 

17.83 

4.11 

1.90 

150 

106.22 

41.81 

10.07 

41.62 

9.84 

3.00 

20.97 

5.04 

2.15 

5 

741.87 

8.62 

8.65 

108.49 

5.51 

4.02 

53.39 

3.12 

2.94 

10 

386.83 

7.57 

5.51 

61.50 

4.33 

2.92 

30.72 

2.50 

2.24 

15 

527.01 

12.33 

8.74 

89.12 

4.90 

2.90 

44.47 

2.78 

2.27 

20 

416.12 

13.36 

7.33 

82.19 

5.03 

2.75 

41.75 

2.78 

2.15 

4000 

25 

381.66 

15.51 

8.32 

67.48 

4.97 

2.64 

33.89 

2.78 

2.07 

50 

233.50 

21.68 

6.57 

81.93 

7.14 

2.86 

40.43 

3.81 

2.25 

75 

199.77 

29.65 

8.75 

68.66 

8.04 

2.91 

34.36 

4.26 

2.24 

100 

187.13 

43.26 

19.67 

69.30 

9.82 

3.19 

34.60 

5.01 

2.38 

125 

135.70 

39.82 

11.15 

56.98 

9.93 

3.15 

28.98 

5.16 

2.30 

150 

131.56 

47.19 

14.34 

61.04 

12.08 

3.68 

30.88 

6.03 

2.60 

totad 

6986.12 

617.34 

262.56 

1626.92 

224.06 

98.01 

824.07 

121.61 

76.08 

Table  4.  -  -  Heuristic  times  in  seconds  for  20  problems  (Cost  range;  1-100) 


5  10 


.33  0.35 
1.2  63.8 


degree 


20  25  I  50 


0.34  0.35  0.34 
62.2  60.9  52.3 


.66  0.68 

1.6  62.6 


125 

i  150 

0.34 

36.7 

0.34 

31.0 

overall 


1  0.39  0.39  0.41 
4  60.2  53.4  46.2 


m 


.93 


1.11  69.3 

71.9 


0.81 

86.5 


l.is  1.19  l.is 
94.9  95.1  SS.O 


4000  time 
%  of  S22 


0.41 

56.7 


0.84  0.6l  0.(6  0.77  0.77 
76.6  67.2  61.9  54.7  |50.3 


1.15  l.is  1.17  jl.l2  11.18  1.12 
S3.4  180.4  76.3  165.0  !60.1  52.5 


1.47  1.47  1.46 
66.0  61.9  56.9 


0.42  j 
65.9  ! 


0.79  0.80 
43.8  70.8  0.97 


1.11  1.16  1  71.9 
51.7  1 4. 1  I 


1.49  1.49 
55.3  76.1 


Table  5.-  -  Heuristic  times  in  seconds  for  20  problems  (Cost  range:  1-1000) 


degree 

overall 

code 

nodes 

5 

10 

15 

20 

25 

50 

100 

125 

150 

avg 

avg 

1000 

time 

0.49 

0.44 

0.43 

o 

o 

0.40 

0.42 

0.41 

0.40 

0.41 

%  of  S22 

73.7 

70.0 

69.0 

64.0 

62.6 

53.5 

43.1 

38.7 

33.2 

35.4 

.54.3 

2000 

time 

rtKra 

gg 

nKi 

IjQ 

0.72 

Ql 

HI 

%  of  S22 

wnpi 

BWBl 

emBI 

42.7 

37.4 

0.92 

3000 

m 

rag 

1.06 

imgl 

EQ 

1.03 

61.4 

bdB 

69.7 

72.7 

64.9 

IB 

So 

51.5 

45.6 

64.6 

4000 

time 

1.42 

rag 

1.41 

1.37 

1.38 

1.37 

1.37 

1.42 

%  of  S22 

82.8 

77.6 

76.4 

72.5 

67.3 

58.1 

47.8 

47.6 

66.2 

1000 

time 

0.59 

0.53 

Bg 

iiKg 

0.47 

0.48 

ED 

0.45 

%  of  S22 

87.7 

84.8 

Bn 

60.9 

50.9 

45.5 

m 

40.5 

64.1 

2000 

nlbg 

bgi 

gg 

0.84 

0.81 

Em 

H2 

^B 

Hal 

59.7 

48.3 

42.6 

69.1 

1.04 

3000 

time 

ngi 

rag 

1.23 

rag 

IQ 

1.15 

1.14 

1.21 

»»■ 

%  of  S22 

80.9 

ED 

58.1 

55.8 

72.7 

4000 

time 

1.68 

1.65 

1.50 

1.58 

1.55 

1.52 

1.53 

1.61 

1.52 

1.57 

1.57 

%  of  S22 

92.8 

90.3 

89.8 

85.4 

81.8 

74.2 

64.7 

54.9 

■53.3 

.52.5 

74.0 

1 

— 1 

1000 

time 

0.61 

0.56 

0.52 

0.48 

0.50 

0.49 

ED 

0.48 

0.52 

%  of  S22 

91.3 

88.7 

83.1 

BB 

76.1 

64.2 

48.0 

09 

42.6 

67.0 

2000 

time 

1.01 

0.97 

0.90 

0.88 

0.90 

0.91 

riiwi 

0.88 

0.91 

H3 

%  of  S22 

95.5 

95.9 

85.0 

85.8 

79.0 

67.9 

64.3 

52.4 

■50.1 

44.3 

72.0 

1.07 

3000 

time 

1^3 

IQ 

1.26 

|]Q 

IQ 

1.18 

1.17 

IQ 

1.24 

72.3  i 

%  of  S22 

84.1 

EB 

laBM 

59.3 

57.0 

IB 

74.4 

1 

1 

4000 

1.71 

1.69 

1.53 

1.62 

1.58 

1  ..55 

1.56 

1.64 

1.55 

1.60 

1.60 

L^_ 

94.5 

92.3 

91.8 

87.4 

83.6 

75.7 

66.1 

56.1 

54.4 

53.6 

75.6 

1 
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Table  6.-  -  Heuristic  times  in  seconds  for  20  problems  (Cost  range:  1-10000) 


degree 

overall 

code 

nodes 

•5 

10 

15 

20 

25 

_ 1 

50 

75 

avg 

avg 

1000 

time 

2.05 

1.54 

IKn 

1.23 

1.21 

1.04 

1.03 

1.00 

inni 

1.25 

%  of  S22 

79.8 

85.2 

SB 

80.6 

79.2 

kII 

63.7 

63.3 

59.0 

72.6 

2000 

time 

f^l 

ivra 

iPy 

IQI 

1.61 

HI 

%  of  S22 

tall 

1*11 

tEB 

70.6 

1.75 

3000 

time 

1.81 

1.74 

1.68 

1.69 

72.0 

%  of  S22 

SB 

80.1 

71.5 

■59.6 

57.8 

bHI 

4000 

time 

3.32 

2.53 

2.42 

2.22 

2.19 

2.02 

1.37 

1.98 

2.20 

%  of  S22 

82.7 

86.6 

83.4 

80.8 

82.9 

58.1 

62.1 

61.8 

■54.8 

72.4 

1000 

time 

2.46 

1.72 

1.51 

1.37 

1.33 

1.15 

1.16 

1.07 

1.05 

1.39 

%  of  S22 

95.8 

95.2 

92.6 

90.4 

86.9 

67.4 

63.3 

63.3 

80.6 

2000 

time 

2.90 

2.13 

1.94 

1.83 

1.69 

1.55 

1.51 

BB 

1.46 

H2 

%  of  S22 

97.4 

93.7 

89.8 

88.6 

77.3 

M 

.54.4 

1.94 

3000 

time 

IQI 

rag 

2.00 

1.91 

1.92 

1.86 

1.72 

1.77 

2.15 

79.8 

%  of  S22 

BBil 

beH 

SB 

SB 

88.6 

78.7 

68.1 

67.8 

58.9 

79.9 

4000 

time 

rara 

rag 

^1 

1.53 

2.08 

2.19 

2.43 

1 

%  of  S22 

64.7 

66.1 

59.6 

79.9 

1 

1000 

time 

2.48 

1.75 

1.53 

1.40 

1.35 

1.16 

1.19 

1.13 

1.09 

1.08 

1.42 

%  of  S22 

96.7 

96.6 

94.2 

92.0 

88.4 

82.0 

72.6 

69.0 

64.4 

64.7 

82.1 

2000 

time 

2.94 

2.16 

1.97 

1.86 

1.71 

1.58 

1.59 

1.51 

1.50 

1.S3 

! 

H3 

%  of  S22 

98.6 

94.9 

93.5 

91.4 

89.8 

78.7 

74.6 

56.1 

55.6 

80.3 

1.98  ' 

3000 

3.35 

2.60 

gg| 

1.89 

HQ 

1^ 

2.20 

81.4  1 

97.5 

97.1 

E^ 

MM 

64.7 

Hill 

1^9 

81.9 

4000 

time 

3.91 

HHM 

2.52 

ggi 

2.28 

1.56 

2.18 

2.11 

2.22 

2.47 

1 

%  of  S22 

97.3 

ESI 

91.7 

EEB 

79.6 

66.1 

68.3 

67.0 

60.5 

Sl.l 

1 

Table  7.-  -  Solution  data  for  20  problems  (Cost  range:  1-100) 


degree 

overall 

code 

nodes 

■5 

10 

15 

20 

25 

50 

75 

100 

avg 

avg 

%  opt 

55.0 

55.0 

70.0 

65.0 

60.0 

66.5 

1000 

%>opt 

2.6 

11.7 

10.9 

11.8 

6.5 

9.4 

7.8 

4.7 

1.2 

7.3 

worst  % 

38.6 

48.4 

45.9 

50.0 

42.3 

26.7 

37.5 

%  opt 

55.0 

40.0 

80.0 

70.0 

40.0 

56.0 

2000 

%>opt 

9.1 

8.4 

7.6 

4.2 

5.6 

12.8 

7.9 

58.3 

HI 

worst  % 

44.8 

30.5 

50.0 

12.5 

28.6 

26,7 

30.0 

45.5 

38.4 

7.6 

%  opt 

[3tBl 

40.0 

65.0 

55.0 

45.0 

MtM 

55.5 

39.5 

3000 

%>opt 

4.6 

9.1 

12.6 

9.3 

7.2 

7.1 

5.6 

8.0 

worst  % 

29.5 

94.5 

35.1 

46.8 

41.7 

25.0 

44.4 

27.3 

22.2 

42.0 

%  opt 

euM 

Hilil 

HiWil 

55.0 

4000 

%>opt 

3.7 

9.8 

3.4 

6.5 

HSi 

11.0 

11.0 

worst  % 

22.9 

46.6 

21.7 

43.9 

50.0 

57.1 

%  opt 

75.0 

65.0 

80.0 

95.0 

80.0 

85.0 

80.0 

%>opt 

3.7 

2.2 

1.3 

5.0 

3.1 

0.5 

4.8 

3.0 

2.S 

worst  % 

.38.6 

26.2 

8.1 

23.5 

26.7 

9.1 

.33.3 

28.6 

21.7 

1 

■■1 

%  opt 

65.0 

80.0 

60.0 

75.0 

85.0 

90.0 

85.0 

75.0 

76.5 

2000 

%>opt 

2.4 

1.7 

2.7 

1.2 

3.1 

2.7 

1.4 

0.8 

2.0 

4.7 

2.3 

76.1 

H2 

worst  % 

16.8 

22.1 

13.8 

11.3 

16.7 

17.6 

13.3 

12.5 

25.0 

42.9 

19.2 

2.6 

%  opt 

75.0 

Hilil 

70.0 

70.0 

7.3.0 

19.9 

3000 

%>opt 

1.8 

3.6 

Q| 

2.9 

2.3 

2.9 

1.4 

2.8 

2.6 

worst  % 

16.7 

35.0 

19.1 

15.2 

37.5 

6.3 

15.4 

19.5 

%  opt 

65.0 

ESfil 

85.0 

90.0 

75.0 

85.0 

75.0 

%>opt 

4.1 

4.5 

2.6 

1.0 

2.3 

1.0 

2.4 

m 

1.7 

2.5 

worst  % 

35.5 

13.7 

20.0 

7.0 

15.2 

12.0 

40.0 

20.0 

16.7 

11.1 

19.1 

%  opt 

100.0 

90.0 

100.0 

95.0 

100.0 

100.0 

85.0 

100.0 

96.5 

1000 

%>opt 

0.0 

0.7 

0.0 

0.1 

0.0 

0.0 

1.2 

0.0 

0.3 

worst  % 

0.0 

7.1 

0.0 

1.8 

0.0 

0.0 

7.7 

0.0 

2.7 

%  opt 

100.0 

80.0 

95.0 

migi 

92.0 

2000 

%>opt 

0.0 

1.7 

la 

1.8 

1.1 

0.4 

1.0 

1.2 

93.6 

H3 

worst  % 

0.0 

22.1 

la 

12.1 

13.3 

5.6 

9.1 

9.4 

0.6 

%  opt 

95.0 

90.0 

100.0 

90.0 

Hilil 

95.0 

93.0 

7.8 

%>opt 

0.1 

0.5 

0.7 

Bl 

0.0 

1.3 

1.0 

0.7 

worst  % 

2.1 

6.5 

6.2 

8.8 

37.5 

0.0 

14.3 

9.1 

9.1 

9.6 

%  opt 

90.0 

80.0 

95.0 

{ouni 

95.0 

100.0 

95.0 

90.0 

90.0 

93.0 

4000 

%>opt 

0.9 

0.7 

0.0 

0.8 

1.4 

1.2 

0.7 

worst  % 

11.2 

9.6 

9.6 

4.3 

12.0 

0.0 

20.0 

18.2 

11.1 

9.6 

Table  8.-  -  Solution  data  for  20  problems  (Cost  range:  1-1000) 


code  nodes 


5 


55.0 

6.6 

25.7 


%  opt  55.0 
2000  %>opt  4.9 
worst  %  19.5 


%  opt  45.0 
3000  %>opt  10.8 
worst  %  48.4 


%  opt 
4000  %>opt 
worst  % 


%  opt  65.0 
1000  %>opt  2.8 
worst  %  38.0 
%  opt 
2000  %>opt 
H2  worst  % 

%  opt  65.0 
3000  %>opt  2.9 
worst  %  27.3 
%  opt 
4000  %>opt 
worst  % 


%  opt  I  SO.O 
2000  %>opt 
worst  % 


10  15 


60.0 

8.1 

44.7 


70.0 


degree 


20  25  j  50 


60.0  55.0  50.0 
10.9  4.8  11.2 

165.8  20.3  40.7 

70.0  .50.0  50.0 
4.2  5.3  11.9 

31.8  .37.4  57.1 


55.0  70.0 


35.4  31.4  23.7 


55.0  45.0  40.0 
12.8  6.3  12.9 
129.0  35.2  73.7 
55.0  45.0  65.0 
5.6  8.4  11.3 

23.4  .38.2  73.7 


35.0 

8.6 

29.1 


60.0  65.0 
5.7  4.5 

32.9  49.4 


65.0  75.0  70.0 
4.9  2.3  1.8 

49.8  31.5  12.3 


SO.O 

0.8 

12.5 


75.0  75.0 
1.0  1.7 

20.3  12.2 


65.0  80.0  85.0 
4.1  1.4  O.S 

12.8  7.7  8.2 


overall 


150  avg  avg 


50.0  50.5 
10.4  9.7 

70.7  67.0 
40.0  53.5 

13.1  8.0  52.6 

82.8  47.2  8.2 


48.5 

8.8 

42.9 


70.0  170.0 
4.9 
31.3 


70.0. 

80.0 

60.0 

70.0 

85.0 

1.8 

0.4 

3.7 

2.9 

0.2 

4.8 

5.2 

26.4 

21.3 

j  1.3 

100.0 

100.0 

95.0 

90.0 

95.0 

85.0 

0.0 

0.0 

0.1 

0.5 

0.1 

0.9 

0.0 

0.0 

3.0 

8.4 

1.2 

13.9 

iQ 

%  opt 

90.0 

95.0 

95.0 

100.0 

3000 

%>opt 

1.6 

0.1 

0.2 

0.0 

worst  % 

27.3 

1.2 

5.2 

0.0 

8.3  1 

%  opt  I  80.0 
4000  %>opt 
worst  % 


80.0  |90.0 
0.6 
10.4 


Table  9.-  -  Solution  data  for  20  problems  (Cost  range:l-10000) 


degree 

overall 

code 

nodes 

5 

10 

15 

20 

25 

50 

75 

100 

125 

150 

avg 

avg 

%  opt 

40.0 

65.0 

50.0 

50.0 

45.0 

75.0 

1000 

%>opt 

8.4 

12.7 

9.8 

7.3 

6.9 

14.3 

11.8 

3.2 

9.7 

worst  % 

32.0 

90.7 

40.6 

35.9 

54.9 

89.5 

48.1 

26.2 

59.4 

67.0 

%  opt 

iKUl 

55.0 

45.0 

40.0 

35.0 

65.0 

55.0 

45.0 

53.5 

2000 

%>opt 

0.3 

13.3 

13.8 

11.6 

6.2 

11.2 

52.6 

HI 

worst  % 

29.6 

57.0 

35.1 

52.3 

46.0 

40.0 

47.2 

8.2 

%  opt 

MM 

50.0 

65.0 

45.0 

40.0 

65.0 

50.0 

58.0 

46.9 

3000 

%>opt 

9.1 

5.4 

7.5 

9.6 

12.8 

5.7 

13.4 

12.6 

6.9 

6.1 

worst  % 

58.8 

22.2 

52.6 

52.0 

65.3 

42.6 

56.5 

99.6 

%  opt 

55.0 

65.0 

50.0 

60.0 

60.0 

60.0 

Wilil 

48.5 

4000 

%>opt 

8.5 

7.1 

2.9 

8.3 

13.8 

8.5 

6.5 

4.9 

8.9 

5.3 

8.8 

worst  % 

45.4 

35.3 

34.5 

32.4 

35.4 

32.3 

28.7 

42.9 

%  opt 

60.0 

80.0 

1000 

%>opt 

3.3 

0.7 

2.5 

1.1 

3.9 

1.9 

2.3 

2.3 

1.3 

3.3 

worst  7c 

12.3 

7.4 

13.3 

11.1 

16.2 

27.5 

3.5 

11.6 

11.1 

27.4 

7o  opt 

80.0 

75.0 

wnBi 

75.0 

75.0 

^65T' 

65.0 

70.0 

70.0 

2000 

%>opt 

1.2 

1.2 

3.0 

4.9 

3.3 

L5 

1.6 

3.2 

5.0 

2.2 

69.9 

H2 

worst  % 

12.5 

10.0 

17.4 

30.6 

31.8 

10.9 

15.9 

21.6 

65.9 

23.5 

14.4 

2.4 

%  opt 

65.0 

90.0 

80.0 

65.0 

70.0 

100.0 

65.0 

72.5 

20.7 

3000 

%>opt 

3.5 

0.3 

1.7 

1.4 

4.2 

2.5 

0.0 

1.5 

1.9 

worst  % 

21.1 

4.4 

11.4 

33.1 

17.6 

14.3 

7o  opt 

60.0 

75.0 

.55.0 

80.0 

70.0 

70.0 

67.0 

%>opt 

1.4 

2.5 

2.3 

2.7 

2.8 

2.7 

0.4 

2.4 

2.4 

2.3 

2.5 

worst  % 

5.9 

27.1 

16.3 

15.9 

9.3 

14.7 

5.2 

19.2 

17.5 

34.7 

%  opt 

85.0 

95.0 

85.0 

95.0 

95.0 

90.0 

94.0 

j 

%>opt 

0.8 

0.3 

O.S 

0.1 

0.0 

0.5 

worst  % 

6.1 

5.2 

6.3 

1.5 

0.0 

19 

2.2 

6.6 

%  opt 

75.0 

80.0 

90.0 

95.0 

95.0 

92.0 

1 

2000 

%>opt 

1.2 

3.1 

0.7 

0.6 

92.1  1 

H3 

worst  % 

1.6 

6.4 

18.5 

7.9 

7.4 

2.6 

6.9 

Kia 

%  opt 

95.0 

95.0 

95. 0 

HWIli 

90.0 

7.6 

3000 

%>opt 

ran 

ran 

0.2 

liiB 

worst  % 

6.8 

EEI 

7.0 

13.3 

1911 

1.7 

8.3 

%  opt 

95.0 

90.0 

95.0 

85.0 

95.0 

95.0 

90.0 

90.0 

95.0 

88.5 

4000 

%>opt 

0.1 

iSI 

1.3 

litlM 

0.4 

wSm 

0.9 

1.2 

0.6 

worst  % 

Bl 

3.4 

15.9 

0.0 

11.4 

m 

12.8 

17.5 

8.4 
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ABSTRACT 


The  objective  of  this  study  was  to  develop  and  empirically  test  a  shortest  augmenting  path 
algorithm  for  the  transportation  problem.  The  algorithm  maintains  dual  feasibility  and 
complementary  slackness  and  works  toward  satisfying  primal  feasibility.  Sophisticated 
heuristics  and  a  modified  scaling  method  are  used  to  achieve  an  excellent  advanced  start. 
Convergence  is  assured  via  the  use  of  the  shortest  augmenting  path  procedure  using  re¬ 
duced  costs  for  arc  lengths.  The  software  implementation  of  our  algorithm  is  uniformly 
faster  than  the  other  competing  software  on  test  problems  having  a  small  total  supply.  As 
the  total  supply  increases,  the  algorithm  degrades  and  is  slower  than  the  best  competing 
software  based  on  a  specialization  of  the  primal  network  simplex  algorithm.  This  manu¬ 
script  provides  the  strongest  evidence  to-date  which  indicates  that  network  problems  hav¬ 
ing  small  total  supply  can  best  be  solved  via  dual  methods  and  network  problems  having 
large  supply  can  best  be  solved  via  primal  methods. 
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I.  INTRODUCTION 


The  classical  transportation  problem  (also  known  as  the  Hitchcock-Koonmans  trans¬ 
portation  problem'!  is  to  find  a  least  cost  set  of  flows  on  an  uncapacitated  bipartite  graph. 
Mathematically,  this  may  be  formulated  as  the  following  special  mathematical  program: 


minimize 

Z 

i.j 

cy  xy 

(1) 

subject  to: 

z 

j 

Xij  =  Si. 

(i  -  1. 

...,  m) 

(2) 

z 

i 

xy  *=  dj. 

(j  =  1. 

...,  n) 

(3) 

Xij  a:  0, 

(all  i,  j 

) 

(4) 

where  Cij  denotes  the  unit  cost  for  shipments  from  source  i  to  demand  j,  Si  denotes  the 
supply  at  source  i,  dj  denotes  the  demand  at  destination  j,  and  xy  denotes  the  flow  from 
source  i  to  destination  J.  If  ®  feasible  solution;  otherwise, 

it  does  not.  It  is  well  known  that  if  Si  for  all  i  and  dj  for  all  j  are  integers,  then  there 
exists  an  optimal  set  of  flows  which  are  integral. 

The  transportation  problem  plays  a  very  important  role  in  the  area  of  network  pro* 
gramming.  Not  only  are  there  numerous  applications  of  the  transportation  problem,  but 
every  minimal  cost  network  flow  problem  can  be  transformed  into  a  transportation  prob¬ 
lem  (see  Wagner  [1959]).  Hence,  the  algorithm  development  for  the  transportation  prob¬ 
lem  may  benefit  the  development  of  algorithms  for  the  minimum  cost  flow  problem. 

The  transportation  problem  is  a  special  case  of  the  minimum  cost  flow  problem  and 
can  be  solved  by  any  of  the  network  flow  algorithms  which  include  the  following: 

(1)  network  simplex  (Johnson  [1966]), 

(2)  out-of-kilter  (Fulkerson  [1961]), 

(3)  relaxation  method  (Bertsekas  and  Tseng  [1988a]), 


(4)  interior  point  algorithm  (Karmarkar  [1984]),  and 

(5)  shortest  augmenting  path  algorithm  (Iri  [I960]). 

The  network  simplex  algorithm  is  a  specialization  of  the  simplex  method  for  solving 
network  problems.  Because  of  the  special  structure  of  the  constraint  matrix  in  the  net¬ 
work  problem,  the  network  simplex  algorithm  completely  eliminates  the  need  for  carrying 
and  updating  the  basis  inverse.  Furthermore,  by  using  the  concept  of  a  rooted  basis  tree 
and  node  potentials,  the  steps  of  the  simplex  algorithm  can  be  performed  directly  on  a 
network  diagram.  In  addition,  special  strategies  for  entering  variable  selection  have  been 
developed  which  enhances  the  performance  of  the  algorithm.  Computer  software  imple¬ 
menting  these  ideas  have  been  successfully  used  to  solve  large-scale  network  problems. 
Even  though  the  network  simplex  method  does  not  have  a  polynomial  running  time 
bound,  empirical  studies  have  shown  that  its  best  computer  software  implementations  are 
comparable  to  or  better  than  the  other  algorithms’  implementations  for  the  minimum  cost 
flow  problem  (Ahuja,  Magnanti,  and  Orlin  [1989]).  For  a  complete  development  of  the 
network  simplex  algorithm  and  its  computational  complexity  see  Kennington  and  Hel- 
gason  [1980],  Papadimitriou  and  Steiglitz  [1982],  Tarjan  [1983],  and  Ahuja,  Magnanti, 
and  Orlin  [1989]. 

The  out-of-kilter  algorithm  is  based  on  the  Kuhn-Tucker  optimality  conditions  and  is 
composed  of  two  phases:  a  primal  phase  and  a  dual  phase.  During  the  primal  phase  the 
dual  variables  are  held  fixed  and  the  flows  are  modified.  During  the  dual  phase,  the 
primal  variables  are  held  fixed  and  the  duals  are  modified.  The  algorithm  iterates  be¬ 
tween  the  primal  and  dual  phases  until  the  Kuhn-Tucker  optimality  conditions  are  satis¬ 
fied.  For  a  detailed  description  of  the  out-of-kilter  algorithm,  see  Kennington  and  Hel- 
gason  [1980].  For  an  extensive  computational  complexity  study  of  this  algorithm,  see 
Ahuja,  Magnanti,  and  Orlin  [1989]. 


The  relaxation  algorithm  (Bettsekas  and  Tseng  [1988a])  is  a  special  implementation  of 
the  primal-dual  method.  It  is  based  on  iterative  improvement  of  a  Lagrange  relaxation 
functional  of  the  minimum  cost  flow  problem.  The  complementary  slackness  conditions 
are  maintained  at  each  iteration  while  the  duals  are  improved  along  coordinate  ascent 
directions.  Termination  occurs  when  primal  feasibility  is  attained.  A  software  implemen¬ 
tation  of  the  algorithm  (RELAX)  is  described  in  Bertsekas  and  Tseng  [1988b]. 

The  interior  point  algorithm  is  a  new  polynomial-time  bound  algorithm  for  solving  the 
linear  program.  It  takes  a  series  of  steps  though  the  interior  points  of  the  primal  feasible 
region  and  moves  in  a  decent  direction  from  one  interior  point  to  another,  eventually 
converging  to  a  near  optimal  solution.  The  algorithm  has  been  implemented  by  AT&T  in 
a  product  called  the  KORBX®t  system.  For  a  complete  theoretical  development  of  the 
interior  point  algorithm,  see  Karmarkar  [1984],  Vanderbei,  Meketon,  and  Freedman 
[1986],  Karmarkar,  Lagarias,  Slutsman,  and  Wang  [1989].  For  an  empirical  study  of  the 
algorithm,  see  Cheng,  Houck,  Liu,  Meketon,  Slutsman,  Vanderbei,  and  Wang  [1989],  and 
Carolan,  Hill,  Kennington,  Niemi,  and  Wichmann  [1990]. 

The  shortest  augmenting  path  algorithm  (SAP)  is  a  dual  method.  Note  that  the  dual 
problem  of  (1)  -  (4)  is 

maximize  SjAi  +  X  (^) 

>  J 

subject  to :  Cij  -  Ai  -  wj  a  0,  (all  i,  j)  (6) 

where  Aj  and  yij  are  dual  variables  associated  with  supply  node  i  and  demand  node  j, 
respectively.  The  complementary  slackness  conditions  are 

(  Cij  -  Ai  -  Wj)  Xij  .  0,  (alii,  J)  (7) 


t  KORBX  is  a  registered  trademark  of  AT&T. 


and  the  value  of  the  expression  ( Cy  -  Ai  -  stj )  is  known  as  the  reduced  cost,  Oy  ,  for  arc 
(i,  j).  The  SAP  algorithm  applied  to  (1)  -  (4)  maintains  a  set  of  duals  (k.  ir)  and  a  set  of 
flow  assignments  (x)  which  satisfy  (4),  (6),  and  (7).  Systematic  changes  are  made  to  both 
the  duals  and  the  flow  assignments  until  (2)  and  (3)  are  also  satisfied.  When  (2).  (3),  (4), 
(6),  and  (7)  are  all  simultaneously  satisfied,  the  corresponding  flow  assignments  are  an 
optimum  for  (1)  -  (4).  An  extensive  theoretical  study  of  the  shortest  augmenting  path 
algorithm  for  solving  the  minimum  cost  flow  problem  can  be  found  in  Tomizawa  [1971], 
Papadimitriou  and  Steiglitz  [1982],  Tarjan  [1983],  and  Ahuja,  Magnanti,  and  Orlin 
[1989]. 

In  this  paper,  we  present  a  new  algorithm  for  solving  the  transportation  problem  which 
is  a  combination  of  heuristic  procedures  coupled  with  SAP.  The  algorithm  consists  of  five 
phases:  column  reduction,  reduction  transfer,  row  reduction  augmentation,  scaling,  and 
shortest  path  augmentation.  The  column  reduction  is  a  simple  heuristic  which  produces 
an  advanced  start.  If  this  heuristic  solves  the  problem,  then  the  algorithm  terminates. 
Otherwise,  two  sophisticated  heuristics  are  applied  (reduction  transfer  and  row  reduction 
augmentation)  in  an  attempt  to  obtain  additional  flow  assignments.  If  the  advanced  heu¬ 
ristics  produce  an  optimal  solution,  the  algorithm  terminates.  Otherwise,  a  modified 
scaling  technique  is  used  to  obtain  additional  flow  assignments.  If  the  scaling  procedure 
produces  an  optimal  solution,  the  algorithm  terminates.  Otherwise,  a  shortest  augmenting 
path  method  is  used  to  complete  the  last  flow  assignments.  The  steps  of  the  SAP  algo¬ 
rithm  can  be  described  as  follows: 

SAP  Algorithm  for  the  Transportation  Problem 
step  1.  Initialization  and  column  reduction  heuristic 
step  2.  If  (2)  and  (3)  are  satisfied,  then  terminate 
step  3.  Reduction  transfer  heuristic 

step  4.  Scaling  heuristic 


step  S.  If  (2)  and  (3)  are  satisfied,  then  tenninate 
step  6.  Row  reduction  augmentation  heuristic 
step  7.  If  (2)  and  (3)  are  satisfied,  then  terminate 
step  8.  Shortest  path  augmentation. 

This  algorithm  is  a  generalization  of  our  previous  work  on  the  assignment  problem 
(Kennington  and  Wang  [1989a])  and  the  semi-assignment  problem  (Kennington  and 
Wang  [1989b]). 

II.  COLUMN  REDUCTION 

The  column  reduction  heuristic  generates  an  initial  set  of  flows  (x),  and  duals  (X.,  ir) 
that  satisfy  (4),  (6),  and  (7).  Note  that  (6)  implies  that  for  each  j, 

^  Cii  -  Xi  ,  for  all  i.  Hence,  we  begin  with  Ai  «  0  for  all  i,  Xy  «  0  for  all  i,  j 
and  Jtj  *  min  { cy :  1  ^  i  ^  m }  for  all  j.  For  those  pairs  (i,  j)  such  that 
Cy  -  Ai  -  ^tj  «  0,  the  xy  is  set  to  the  largest  value  possible  such  that  the  total  flow  origi¬ 
nating  at  source  i  does  not  exceed  Si  and  the  total  flow  to  destination  j  does  not  exceed 
dj .  Obviously  dual  feasibility  and  complementary  slackness  conditions  are  preserved  by 
this  procedure. 

Proposition  1.  The  time  bound  for  the  column  reduction  is  0(mn). 

At  the  termination  of  the  COLUMN  REDUCTION  procedure  (4),  (6)  and  (7)  will  be 
satisfied.  If  (2)  and  (3)  are  also  satisfied,  then  an  optimum  has  been  found.  However, 
this  never  occurred  in  our  empirical  experiments  and  additional  work  was  required.  In 
the  column  reduction,  the  cost  range  is  a  crucial  factor  which  affects  the  amount  of 
supply  that  can  be  assigned  to  destination  nodes.  In  general,  the  smaller  the  cost  range, 
the  more  flow  that  can  be  assigned  to  destination  nodes.  One  experiment  showed  that  for 
a  cost  range  of  from  0  to  100,  almost  75%  of  the  total  supply  was  assigned  to  destination 


nodes  by  the  column  reduction  heuristic;  while,  for  a  cost  range  of  from  0  to  100,000, 
only  one-half  of  the  total  supply  was  assigned.  Of  course,  the  unassigned  supply  can  be 
assigned  by  generating  a  sequence  of  shortest  augmenting  paths  from  source  nodes  with 
unassigned  supply  to  destination  nodes  with  unassigned  demand.  However,  since  the 
shortest  path  augmentation  is  computationally  expensive  (we  will  discuss  this  in  Section 
V),  there  is  great  motivation  to  use  this  heuristic  to  assign  as  much  flow  as  possible 
before  using  the  shortest  augmenting  path  procedure. 

In  the  following  two  sections,  we  present  two  other  more  sophisticated  heuristics:  re¬ 
duction  transfer  and  row  reduction  augmentation.  The  application  of  these  two  proce¬ 
dures  can  result  in  the  assignment  of  additional  supply  at  a  very  reasonable  computational 
expense. 


III.  REDUCTION  TRANSFER 

At  the  termination  of  the  column  reduction  procedure,  a  forest  can  be  formed  from 
the  arcs  having  nonzero  flow.  A  forest  composed  of  four  trees  (components)  is  illustrated 
in  Figures  1,  2,  and  3.  Note  that  for  each  tree,  at  most  one  node  has  remaining  supply  or 
demand.  Since  (7)  is  satisfied,  the  reduced  cost  for  each  arc  in  each  tree  is  zero. 
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Figure  1.  Sample  Transportation  Problem.  Figure  2.  Cost  Matrix. 

({•}  gives  the  supply  (demand)  at  source  (*  shows  min{cij}  in  the 

(destination)  node.)  column  reduction.) 


Figure  3.  The  Forest  Associated  with  the  Sample  Transportation  Problem. 

((*)  gives  the  remaining  supply  (demand)  at  source  (destination)  node  and  [*] 
denotes  the  flow  on  the  corresponding  arc.) 


Let  C  be  a  component  with  remaining  supply.  Let  L  {ii,  is)  and 
Jc  ■  {ji,  ••••  jt)  be  the  set  of  supply  and  demand  nodes  in  C,  respectively.  The  reduction 
transfer  procedure  for  the  supply  component  C  reduces  dual  variable  ttj  of  j  by  some 
amount  A  >  0  ,  for  all  j  E  Jc ,  and  for  all  i  E  L  increases  ^  by  A.  Dual  feasibility 
considerations  determine  an  upper  bound  for  A. 


Proposition  2.  Let  (X,  ir,  x)  denote  a  set  of  vectors  that  satisfy  (6)  and  (7).  Let  C  be 
a  component  with  a  set  of  supply  nodes  «  {ii,  if}  and  a  set  of  demand  nodes 
Jc  *  {ji.  •••.  jt}  •  Select  A  such  that  0  ^  A  ^  miniEi^  [  mink^j^  (<»&)]•  Let 
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and 


Wk 


{ 


otherwise 


jTk  -  A,  k  e  Jc 
jTk,  otherwise. 


Then  (A,  jr,  x)  satisHes  (6)  and  (7). 


The  proof  of  Proposition  2  may  be  foiuid  in  Wang  [1990]. 

The  reduction  transfer  procedure  is  presented  below: 

Procedure  REDUCTION  TRANSFER 
begin 

1.  I  ^  {1,  ....  m},  J  ♦-  {1,  ....  n},  and  let  11  denote  the  set  of  supply 
components  obtained  from  Procedure  COLUMN  REDUCTION; 

2.  for  all  C  €  n 

3.  Ic  ^  {i:  i  G  C  and  i  G  1}  and  Jc  *-  {j:  j  G  C  and  j  G  J}; 

4.  A  *-  min  {  <o[i,  j]:  i  G  I*  and  j  G  J-Jc  }; 

5.  7r[j]  ♦-  irO]  -  A,  for  all  j  G  L; 

6.  X.[i]  ♦-  X[il  +  A,  for  all  i  G  L; 

7.  end  for 
end 

Note  in  step  1,  we  only  apply  the  reduction  transfer  to  supply  components  which 
insured  that  the  dual  functional  will  not  be  reduced. 


Proposition  3.  The  time  bound  for  the  reduction  transfer  procedure  is  0(mn). 


Proof.  In  the  worst  case  obtaining  the  final  value  of  A  takes  0(1 !« 1  1  J  -  Jc  |)  time,  and 
since  X I  ^  I  ^  1  J  -  Jc  1  ^  n,  the  time  bound  for  the  procedure  is  0(mn).  ■ 

Even  though  the  reduction  transfer  does  not  directly  reduce  the  unassigned  supplies,  it 
may  help  to  achieve  more  flow  assignments  in  the  following  row  reduction  augmentation 
and  shortest  path  augmentation  procedures.  That  is,  after  the  reduction  transfer,  a  satu¬ 
rated  demand  node  j  is  more  expensive  to  the  supply  nodes  which  are  not  in  the  same 
component,  and  an  unexhausted  supply  node  is  less  expensive  to  the  demand  nodes  which 
are  not  in  the  same  component. 


IV.  ROW  REDUCTION  AUGMENTATION 

The  row  reduction  augmentation  is  another  heuristic  procedure  that  further  reduces 
the  unassigned  supply  while  maintaining  dual  feasibility  as  well  as  the  complementary 
slackness  conditions.  As  we  mentioned  earlier,  the  saturated  demand  nodes  are  more 
expensive  to  unassigned  supply  nodes  which  are  not  in  the  same  component  and  unas¬ 
signed  supply  nodes  are  less  expensive  to  demand  nodes  after  the  reduction  transfer; 
therefore,  an  application  of  the  row  reduction  augmentation  is  more  likely  to  achieve 
additional  flow  assignments. 

In  a  classical  row  reduction,  for  each  unassigned  supply  node  i,  we  find  8  «=  min  { cja^ : 

#  k  «  1,  ...,  n  }  and  increase  A]  by  d  .  Then  if  there  exists  a  demand  node  j  such 

that  (Oil  "  ^  unsaturated,  we  assign  f  units  of  flow  from  source  node  i  to  destina¬ 

tion  node  j,  where  f  »  min  {  Si  -  xikt  dj  -  xy  },  Dual  feasibility  and  the  com- 

•  plementary  slackness  conditions  are  maintained  and  we  reduce  the  total  unassigned  sup¬ 
ply  by  f. 

Proposition  4.  Let  I  =  {1,  ...,  m},  and  J  *  {1,  ....  n},  and  (X.,  it,  x)  denote  a  set  of 
^  vectors  that  satisfy  (6)  and  (7).  Select  an  i  e  {1:  Xk  <  Si,  1  G  1}  and  let  d  =  min 
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{ :  k  €  J}.  Let 


I  ;ii  d.  I  »  i 

A,  -  i 

I  Ai,  otherwise. 

Select  a  j  £  J  such  that  (Uy  -  d  . 

If  Xy  <  dj ,  then  let  f  =  min  { Sj  -  Xik,  dj  -  Xy }  and 

I  xu  +  f,  if  1  =  i  and  k  =  j 

Xlk  ^  1 

I  xuc,  Otherwise. 

If  xy  =  dj,  select  a  p  £  I  such  that  p  ^  i  and  Xpj  ^  0  ,  then  let 
f  a  min  { Sj  -  2k  Xik,  Xpj }  and 


Xy  +  f,  if  1  =  i  and  k  «  j 
Xpj  -  f,  if  1  =  p  and  k  =  j 
Xu,  otherwise. 


Then  (X,  ;r,  x )  satisfies  (7). 


The  proof  of  Proposition  4  may  be  found  in  Wang  [1990]. 


In  our  study,  we  modified  the  classical  row  reduction  procedure  by  allowing  the  flow 
assignment  to  a  saturated  demand  node  to  be  changed  from  one  source  node  to  another 
(as  described  in  Proposition  4).  Also,  we  embedded  the  reduction  transfer  procedure 
within  the  row  reduction,  so  that  the  deassigned  source  node  may  find  another  unsatu¬ 
rated  demand  node  in  the  next  round  of  the  application  of  row  reduction.  We  call  this 
modified  procedure  the  row  reduction  augmentation. 

Let  rs[i]  and  rd[j]  be  the  remaining  supply  and  demand  at  supply  node  i  and  demand 
node  j,  respectively.  The  row  reduction  augmentation  procedure  may  be  described  as 
follows: 
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Procedure  ROW  REDUCTION  AUGMENTATION 
begin 

1.  I  *-  {1,  m}  and  J  *-  {1,  ....  n}; 

2.  UNEXHAUSTED  ^  {i  :  rs[il  >  0.  i  e  I }; 

3.  for  all  i  G  UNEXHAUSTED 

4.  8  ^  min  {(i)[i,  j]:  j  G  J}; 

5.  JCAND  {j  :  a)[i,  j]  »  8  and  j  G  J}; 

6.  for  all  j  G  JCAND 

7.  if  rd[j]  >  0,  then 

8.  f  ^  min  {rs[i],  rd[j]} 

9.  x[i,  j]  <•-  x[i,  j]  +  f,  rs[il  —  rs[i]  -  f, 

and  rd[j]  *-  rd[j]  -  f;  (flow  assignment  from  i  to  j) 

10.  end  if 

11.  if  rs[i]  =  0,  go  to  27; 

12.  end  for 

13.  for  all  j  G  JCAND 

14.  Ij  *-  {1:  x(l,  j]  ^  0,  Is^  i,  and  I  G  I}; 

15.  if  Ij  ^  go  to  18; 

16.  end  for 

17.  go  to  27; 

18.  for  all  1  G  Ij 

19.  M.[l]  ♦-  min  {(i)[l,  k]:  k  G  J,  k^j}; 

20.  end  for 

21.  0  —  min  {^[1]:  1  G  Ij }; 

22.  LCAND  —  {1:  n[I]  =  0  and  1  G  Ij  }; 
select  a  p  G  LCAND; 


23. 


25. 


xIp*  il^  *Ip*  j]  “  ^  *'s[pl*“  rslp]  +  f;  (deassign  f  units  of  flow  from  p  to  j) 

26.  x[i,  j]^  x[i,  j]  +  f  and  rs[i]  ♦-  rs[i]  -  f;  (assign  f  units  of  flow  from  i  to  j) 

27.  \[i]  —  Mil  +  8; 

28.  if  rs[i]  >  0,  then 

29.  do  steps  4  through  7  of  Procedure  REDUCTION  TRANSFER; 

30.  end  if 

31.  end  for 
end 

Proposition  5.  The  time  bound  for  the  row  reduction  augmentation  is  0(  m^n  ). 

Proof.  Let  J  *  {1,  ....  n}.  For  each  i  e  UNEXHAUSTED,  obtaining  the  value  of  8  =  min 
{ Oiic :  k  G  J}  and  the  index  j  takes  0(n)  time.  Because  |  Ij  |  ^  m  and  obtaining  the 
value  of  4(1]  «  min  {wfl,  k]:  k  e  J,  k^j}  for  eachl  G  Ij  takes  0(n)  time,  obtaining  the 
value  of  0  takes  0(mn)  time.  Therefore,  the  time  bound  for  the  row  reduction  augmenta¬ 
tion  on  the  supply  node  i  is  0(mn).  Since  at  most  m  supply  nodes  are  left  unassigned,  the 
total  time  for  the  procedure  is  0(  m^n ).  ■ 

The  effect  of  the  row  reduction  augmentation  in  reducing  the  total  unassigned  supply 
is  extraordinary;  particularly,  for  problems  with  a  large  cost  range.  This  is  due  to  the  fact 
that  the  reduction  transfer  reduces  the  value  of  the  dual  variable  associated  with  the 
demand  node  j;  so  consequently,  the  source  node  p  which  was  deassigned  from  j  is  likely 
to  be  immediately  reassigned  to  another  demand  node.  For  a  test  problem  having  400 
sources,  400  destinations,  total  supply  of  100,000,  and  a  cost  range  of  [0,  100000],  54,000 
units  of  total  supply  remained  unassigned  after  the  column  reduction.  A  single  applica¬ 
tion  of  the  row  reduction  augmentation  procedure  resulted  in  almost  20,000  units  of  new 
flow  assignments.  A  second  application  of  this  procedure  yielded  about  10,000  units  of 
additional  flow  assignments.  For  all  the  problems  we  tested,  the  total  unassigned  supply 


is  reduced  to  about  20%  of  the  total  supply  after  the  procedure  is  applied  four  times.  We 
also  found  that  the  larger  the  cost  range,  the  more  benefit  can  be  achieved  by  multiple 
applications  of  the  procedure.  Of  course,  we  can  apply  the  row  reduction  augmentation 
procedure  as  long  as  the  flow  is  being  assigned.  However,  an  inordinate  amount  of  time 
can  be  spent  in  switching  supply  assigned  to  a  saturated  demand  node  when  the  proce¬ 
dure  tries  to  complete  the  last  few  flow  assignments.  Also,  this  heuristic  may  never 
achieve  a  complete  flow  assignment.  Hence,  the  row  reduction  procedure  should  be 
terminated  after  a  given  number  of  applications.  Based  on  our  experience,  we  recom¬ 
mend  application  of  the  row  reduction  augmentation  procedure  T  times  where 
T  =  max  {  1,  B'lTC  logio  [  max  (  Cy  :  for  all  i,  j  )  ]  -  1  )  }  and  INT(a)  denotes  the 
largest  integer  such  that  INT(oi!)  ^  a. 


V.  SHORTEST  AUGMENTING  PATH 

The  shortest  augmenting  path  procedure  is  used  to  complete  the  last  few  flow  assign¬ 
ments.  In  each  iteration,  this  procedure  selects  an  unexhausted  supply  node,  say  i,  and 
builds  a  shortest  path  from  i  to  an  unsaturated  demand  node  using  the  reduced  cost  for 
arc  lengths.  Then,  it  augments  some  amount  of  flow  f  along  this  path  by  assigning  f 
additional  units  of  flow  from  all  supply  nodes  on  the  path  to  their  succeeding  demand 
nodes,  and  deassigning  f  units  of  flow  from  all  supply  nodes  to  their  proceeding  demand 
nodes.  The  total  flow  is  increased  by  f  with  each  application  of  this  procedure.  We  use  a 
label  setting  algorithm  to  obtain  the  shortest  path  and  stop  the  procedure  whenever  an 
unsaturated  demand  node  has  been  permanently  labeled. 

Proposition  6.  The  shortest  augmenting  path  procedure  has  a  time  bound  of 


O  ( Umn ) ,  where  U  is  the  total  supply. 


The  proof  of  Proposition  6  and  a  detailed  description  of  our  implementation  of  the  label 
setting  algorithm  and  dual  updating  procedure  may  be  found  in  Wang  [1990]. 

From  Proposition  6,  we  can  see  that  using  the  shortest  augmenting  path  (SAP)  proce¬ 
dure  to  solve  the  transportation  problem  may  be  very  expensive  when  the  total  supply  is 
large.  For  example,  let  i  be  an  unexhausted  supply  node  with  4  units  of  remaining 
supply.  In  the  worst  case,  it  probably  requires  the  development  of  four  shortest  path  trees 
in  order  to  complete  these  4  units  of  flow  assignments.  Consider  the  shortest  path  tree 
illustrated  in  Figure  4.  Supply  node  i  has  4  units  of  remaining  supply  and  demand  node  j 
has  3  units  of  unsatisfied  demand.  However  we  can  not  send  3  units  of  flow  through  the 
path  {i.  h.  *2.  j }  because  the  residual  capacity  of  the  arc  (h.  b  )  is  1.  Suppose  f  is  the 
amount  of  flow  which  can  be  sent  along  the  shortest  augmenting  path.  Clearly, 
f  s  min  {  Si  -  ^  Xik  ,  dj  -  X  ^  •  If  ^  =  Si  “  X  .  then  supply  node  i  is  exhausted 
and  one  will  select  another  unexhausted  supply  node,  if  any,  to  continue  the  work.  If 
f  <  Si  -  ^Xik,  then  we  can  reuse  part  of  the  shortest  augmenting  path  tree.  The  follow¬ 
ing  proposition  presents  the  theory  we  use  in  saving  part  of  the  shortest  path  tree. 


Figure  4.  Sample  Shortest  Path  Tree 

({*}  gives  the  remaining  suppiy(demand)  at  root  (tail)  node  and  [*] 
denotes  the  flow  or  residual  capacity  on  the  corresponding  arc.) 

Proposition  7.  Let  I  «  {1,  ...,  m},  J  =  {1,  ...,  n},  and  (X,,  tt,  x)  denote  a  set  of  vectors 
that  satisfy  (6)  and  (7).  Select  an  i  €  { 1:  <  s[l],  1  E  1}  and  let  q  be  an  unsatu- 


rated  demand  node.  Let  S  »  { i,  ji,  ii,  ....  jt,  it,  q }  be  the  shortest  alternating  path 
from  i  to  q.  Suppose  that  dist[j]  is  the  distance  label  of  j  when  the  procedure  is  termi¬ 
nated;  P  is  the  set  of  demand  nodes  with  permanent  distance  labels;  and  Q  is  the  set  of 
supply  nodes  whose  predecessors  are  in  P.  Let  d  be  the  current  minimum  distance  label 
of  demand  nodes  in  the  set  T,  and  let  f  be  the  maximum  flow  that  can  be  augmented 

along  the  shortest  path  S,  i.e.  f  »  min  {  Si-  ,  dj  -  ^xy  ,  x  .  x  y, }.  Let 

jpred[j]  be  the  predecessor  of  j,  j  €  J ;  and  ipred[il  be  the  predecessor  of  i,  i  E  I .  Let 


A, 


Ai  +  d,  1  =  i 

A,  -  distjipredll]]  +  <5,  1  E  Q  \  {i} 
Ai,  otherwise, 

Jii  +  dist[k]  -6,  k  E  P 
Wk,  otherwise. 


K  f  <  Si-^Xik  and  f  <  min  {x^j,  ...,  xy, },  then  let 

0,  kEP 
dist[k],  otherwise. 


dist[k] 


P*-P,  t^-T,  Q*-Q,  jpred«-jpred,  ipred«-ipred,  and  d«-0. 
If  f  «  min  {Xi,j,,  ....  Xy,  )and  f  <  Sj-^x^,  then  let 
h  «  min  {  r:  x  y,  »  f  and  r  «  1,  ...,  t  }, 

Ph  ♦“  { k  :  kEP  and  k  became  permanently  labeled  before  jh},  and 
Qh  ^  { 1 :  1 »  i  or  1  E  Q  and  ipred  [1]  E  Ph).  Let 


ipred[l] 


ipred[l],  1  E  Qh 
0,  otherwise. 


E-17 


jpredik] 


j  jpred[k],  k  G  Ph 
i  jpred[k],  k  Pj,  and  jpred[k]  G  Qh 


y  1,  where  1  G  { cDik  »  di*st[kl,  1  G  Qh },  k  ^  Ph  and  jpred[k]  ^  Qh, 

10.  k  G  Ph 

dist[k]  -6,  k  ^  Ph  and  jpred[k]  G  Qh 

min  {<wik  :  1  G  Qh  },  k  ^  Ph  and  jpred[k]  ^  Qh, 

and  P  Ph,  t  ♦-  {  k  :  dist[k]  «  0,  k  G  J\Ph  },  Q  Qh,  and  6*-  0. 

Then  dist  is  a  set  of  valid  distance  labels,  i.e.,  P  is  the  set  of  demand  nodes  with  dist  as 
new  permanent  distance  labels;  Q  is  the  set  of  supply  nodes  whose  predecessors  are  in 
P  with  the  root  node  i;  d  is  the  current  minimum  distance  label  of  demand  nodes  in  the 
set  t ;  jpred[j]  is  the  predecessor  of  j;  and  ipred[i]  is  the  predecessor  of  i. 


A  detailed  proof  of  Proposition  7  may  be  found  in  Wang  [1990]. 

We  have  implemented  Propositions  7  in  our  algorithm,  so  that  the  procedure  will 
continue  to  use  the  shortest  augmenting  path  spanning  tree  rooted  at  supply  node  i  until 
its  supply  is  exhausted.  The  empirical  analysis  showed  that  by  doing  so  it  saves  about 
one-third  of  the  time  required  for  this  procedure. 


VI.  THE  ALGORITHM 

The  shortest  augmenting  path  algorithm  for  the  transportation  problem  can  be  stated 


as  follows: 


Procedure  SAP_T 
begin 

1.  Perform  a  COLUMN  REDUCTION  as  described  in  Section  D; 

2.  If  an  optimum  has  been  obtained,  then  terminate; 

3.  Perform  a  REDUCTION  TRANSFER  as  described  in  Section  ni; 

4.  Perform  a  ROW  REDUCTION  AUGMENTATION  as  described  in  Section  IV; 

5.  If  an  optimum  has  been  obtained,  then  terminate; 

6.  Perform  the  SHORTEST  AUGMENTING  PATH  Procedure  as  described  in 
Section  V  until  an  optimum  has  been  obtained. 

7.  end 

Theorem  1.  Let  U  be  the  total  supply  and  suppose  U  ^  max  {m,  n}.  Then  the 
shortest  augmenting  path  algorithm  (Procedure  SAP_T)  will  achieve  an  optimal  solution 
in  0(Umn)  time. 

Proof.  From  Propositions  1,  3,  5,  and  6,  the  column  reduction  takes  0(mn)  time,  the 
reduction  transfer  takes  0(mn)  time,  the  row  reduction  augmentation  takes  0(  m^  n ), 
and  the  sh  ortest  path  augmentation  takes  0(Umn).  Since  U  ^  m.  SAP_T  has  a  time 
bound  of  0(Umn).  ■ 

This  algorithm  has  been  implemented  in  a  FORTRAN  code  called  SAP_T.  It  stores 
the  costs  in  a  single  dimensioned  array  of  length  mn.  In  addition  it  uses  six  m  length 
arrays,  eight  n  length  arrays,  and  four  arrays  of  length  max{m,  n}. 

The  codes  which  we  obtained  for  comparison  are  RELAX  (Bertsekas  and  Tseng 
[1988a  and  1988b]),  NETFLO  (Kennington  and  Helgason  [1980]),  and  NETSTAR  devel¬ 
oped  by  Richard  S.  Barr  of  Southern  Methodist  University.  RELAX  is  a  special  imple¬ 
mentation  of  the  primal-dual  algorithm.  Both  NETFLO  and  NETSTAR  are  special  imple¬ 
mentations  of  the  network  simplex  method.  Kennington  and  Helgason  have  compared 
their  code  NETFLO  with  RNET  which  is  a  network  simplex  code  developed  by  T.  Hsu  and 


M.  Grigoriadis  at  Rutgers  University  and  the  empirical  results  showed  that  NETFLO  is 
about  4%  slower  than  RNET  on  comparable  problems.  We  believe  that  NETSTAR  is  one 
of  the  world’s  fastest  pure  network  codes. 

Tables  1  through  S  present  our  empirical  results  with  RELAX,  NETFLO,  and 
NETSTAR  on  eighty  randomly  generated  problems.  The  total  supply  varies  from  1000  to 
100,000.  The  column  entitled  size  gives  the  number  of  sources  and  number  of  destina¬ 
tions  so  that  the  number  of  nodes  is  always  double  the  size.  The  problems  were  all  dense 
so  that  the  number  of  arcs  is  the  size  squared.  None  of  these  codes  exploited  the  fact  that 
the  problems  were  dense.  The  test  runs  were  performed  on  a  Sequent  Symmetry  S81  at 
SMU.  This  machine  is  configured  with  32  Mbytes  of  memory  and  runs  a  modified  ver¬ 
sion  of  UNIX’"t.  A  summary  of  Tables  1  through  5  is  presented  in  Figure  5. 
NETSTAR  is  clearly  the  fastest  of  the  three  codes.  Both  NETSTAR  and  NETFLO  are 
very  robust  over  a  wide  range  of  parameters.  RELAX  is  very  sensitive  to  the  total  supply 
and  required  more  than  twice  the  time  to  solve  these  eighty  test  problems  as  did 
NETSTAR. 


t  UNIX  is  a  trade  mark  of  AT&T  Bell  Laboratories. 


(seconds) 


Figure  5.  Performance  of  RELAX,  NETFIX),  and  NETSTAR. 

Each  time  is  a  sum  of  solution  times  of  16  mxm  transportation  problems,  in  which 

m  ranges  from  100  to  400,  maximum  cost  ranges  from  100  to  100,000. 

Total  solution  times  for  80  problems: 

RELAX  -  3150.18  (secs.),  NETFLO  -  2418.69  (secs.),  NETSTAR  -  1362.03  (secs.). 

Tables  6  through  10  present  our  empirical  results  comparing  NETSTAR  with  SAP_T. 
The  shortest  augmenting  path  code  is  very  sensitive  to  the  total  supply  and  was  dominated 
by  NETSTAR  on  four  of  the  five  test  sets,  l^ere  as  NETSTAR  requires  approximately 
the  same  time  for  each  of  the  five  test  sets  (we  attribute  this  fact  to  the  unique  feature  of 
the  primal  simplex  method  which  starts  with  a  feasible  solution  in  the  process  of  finding 
an  optimum),  SAP_T  requires  3.5  times  as  much  time  to  solve  the  fifth  test  set  as  the 
first.  This  is  consistent  with  the  complexity  result  presented  in  Theorem  1.  The  perform¬ 
ance  of  SAP_T  and  NETSTAR  as  a  function  of  total  supply  is  illustrated  in  Figure  6. 


(seconds) 


Figure  6.  Performance  of  SAP_T  and  NETSTAR  as  a  function  of  total  supply. 

Each  time  is  a  sum  of  solution  times  for  16  mxm  transportation  problems,  in 
which  m  ranges  from  100  to  400,  maximum  costs  range  from  100  to  100,000. 


VII.  THE  SCALING  TECHNIQUE 

One  technique  which  can  minimize  the  effect  of  the  total  supply  on  the  performance 
of  SAPjr  is  the  scaling  technique.  The  idea  is  to  modify  the  shortest  augmenting  path 
procedure  so  that  larger  augmentations  are  possible  during  the  early  stages  of  the  algo¬ 
rithm.  Hopefully,  fewer  augmentations  will  be  required  to  attain  optimality.  The  scaling 
technique  we  use  is  called  the  Edmonds-Karp  scaling  method  (Edmonds  and  Karp 
[1972])  or  right-hand-side  scaling  (Orlin  [1988]). 

The  shortest  augmenting  path  algorithm  with  the  scaling  feature  for  solving  the  trans¬ 
portation  problem  has  been  implemented  in  a  FORTRAN  code  called  SAP_S JT  and  com¬ 
pared  with  NETSTAR.  The  empirical  analysis  is  presented  in  Tables  11  through  15. 
SAP_S_T  was  some  35%  faster  than  SAP_T  but  was  still  sensitive  to  the  total  supply.  For 


the  test  runs  with  total  supply  of  1,000  and  5,000,  SAP_S_T  was  faster  than  NETSTAR. 
For  the  test  runs  with  total  supply  of  10,000,  50,000,  and  100,000,  NETSTAR  dominated 
SAP_S_T.  A  summary  of  all  the  empirical  results  with  all  codes  is  illustrated  in  Figure  7. 
(seconds) 


Figure  7.  Performance  of  RELAX,  NETFLO,  NETSTAR,  SAP_T,  and  SAP_S_T. 

Each  time  is  a  sum  of  solution  times  of  16  mxm  transportation  problems,  in 
which  m  ranges  from  100  to  400,  maximum  cost  ranges  from  100  to  100,000. 
Total  solution  times  for  80  problems: 

RELAX  =  3150.18  (secs.),  NETFLO  -  2418.69  (secs.),  NETSTAR  »  1362.03 
(secs.),  SAP_T  »  1950.06  (secs.),  SAP_S_.T  »  1443.54  (secs.). 

SAP_S_T  dominates  both  RELAX  and  NETFLO  on  all  test  problem  sets.  For  small 
amounts  of  total  supply  (below  5,000)  SAP_S_T  is  faster  than  NETSTAR.  When  the  total 
supply  was  at  least  10,000,  NETSTAR  dominated  all  of  the  codes. 


VIII.  SUMMARY  AND  CONCLUSIONS 


In  this  study,  we  present  an  extension  of  the  shortest  augmenting  path  algorithm  to  the 
transportation  problem.  This  approach  has  been  extremely  successful  when  applied  to 
both  the  assignment  and  the  semi-assignment  problem  dominating  all  other  competing 
algorithms  by  a  wide  margin.  In  this  study,  we  found  that  this  algorithm  is  extremely 
sensitive  to  the  total  supply  and  degrades  as  the  total  supply  increases.  For  problems  with 
small  total  supply,  the  algorithm  is  the  best  method  available  outperforming  the  highly 
successful  NETSTAR.  The  scaling  method  is  very  effective  when  applied  to  this  algo¬ 
rithm  reducing  the  total  time  by  approximately  35%.  The  key  to  success  with  this  type  of 
algorithm  is  the  use  of  clever  heuristics  to  get  an  advanced  start  prior  to  application  of 
methods  for  generating  shortest  path  trees. 

Both  empirical  evidence  and  the  complexity  analysis  confirm  our  belief  that  network 
problems  which  have  small  total  supply  can  best  be  solved  using  dual  methods  and  net¬ 
work  problems  which  have  large  total  supply  can  best  be  solved  using  primal  methods. 
We  believe  that  the  present  study  is  the  first  to  clearly  demonstrate  this  phenomena.  This 
conclusion  is  consistent  with  our  previous  studies  on  the  assignment  problem  and  the 
semi-assignment  problem. 
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Table  1.  Comparison  of  RELAX,  NEITLO,  and  NETSTAR  on  nxn 
Transportation  Problems  Having  1,000  Units  of  Supply 
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Table  2.  Comparison  of  RELAX,  NETFLO,  and  NETSTAR  on  nxn 
Transportation  Problems  Having  5,000  Units  of  Supply 


o 

C 

• 

Cf) 

o 

o  a\  o\  o 

VO 

00  00  CNl  VO 
N-t  ^ 

CO  »o  n  n-h 
(S 

CO  CO 

CM  00  CO  O 

CO 

On 

• 

ec 

B 

•tm 

H 

a> 

>52^ 

tH 

00  00  00  00 

00  00  On  0\ 

tH  ^-h 

VO  O  »n 

CO  CO  Tj- 

oci 

VO 

CM 

1 

to 

C 

_o 

• 

2 

VO  t-f  ov  00 

Tf  o  n 

»0  VO  VO  VO 

O  ^  >0 

CO  ^  fNj 

On  o  00 
On  »0  VO  »0 
O  Tj-  CO  »n 

(S  CNi  n  <s 

O  O  »o 

»0  CO  Tt  O 
00  VO  00  ON 
CM  CO  CO  CO 

o 

CO 

C^ 

CM 

CO 

• 

• 

1-H  ^ 

CO  »-<»-•  VO 

CO  On  0> 

^  -rf  Tf  00 

On  On  CO 

CM  O  n-h  CO 
VO  On  CO 

o 

VO 

o 

B 

P 

a> 

to 

fS  CO  CO  f<S 

r<i  c<S 

VO  u-i  00  CO 
CO  CO  CO  CO 

Tt  ON 

VO  VO  VO  VO 

On 

■M- 

A 

c 

H 

to 

B 

w 

a 

u 

a 

ts  ^ 
o  VO 

Ov  Ov  Ov  On 

«o  »o  «o  »o 
<N|  r>J  VO  ’-t 
fO  »o  »o  »o 
tvj 

C^  00  CO  (N 
^  00  CO  vr> 
On  VO  Tt 

CO  Tt  ^ 

VO  VO  VO  O 
Tj-  VO  r«-  00 
VO  ?-l  NO  CO 
VO  VO  VO  VO 

CM 

o 

VO 

VO 

• 

a> 

e 

• 

CA 

o 

00  ^  00 
On  rJ  <S  O 

N-t  VO  ^  ^ 

o  00 

O  -vt  00 

VO  VO  00  r- 

Ti-  r-  CM  Tt 
N-i  VO  O  VO 

VO 

B 

0) 

to 

Si.^ 

n-h  rt 

00  CO  »0  On 

^  CO  rH 

00  VO  «n  On 
fNi  (S  Tt 

O  -Vt  VO  CO 
VO  VO  00  ■'t 

1-H 

00 

VO 

• 

to 

B 

.2 

a 

u 

V 

00  cn  V-)  vn 

O  fO  VO 

rH 

O  (S  VO  ON 

^-H 

^  ^  (Vj 

CM 

• 

to 

o 

U 

o 

00 

B 

es 

o 
o  o 
o  o  o 
o  o  o  o 
o  o  o  o 

o 
o  o 
o  o  o 
o  o  o  o 
o  o  o  o 

o 
o  o 
o  o  o 
o  o  o  o 
o  o  o  o 

o 
o  o 
o  o  o 
o  o  o  o 
o  o  o  o 

# 

e 

1  1  1  1 

1  1  1  1 

till 

1  1  1  1 

o 

o  o  o  o 

o  o  o  o 

o  o  o  o 

o  o  o  o 

•P4 

u 

to 

« 

o 

g 

M 

u 

< 

o 

o 

o 

o 

T-( 

o 

o 

o 

o 

o 

o 

o 

o 

On 

o 

o 

o 

o 

VO 

i 

• 

M 

ct 

z 

1 

to 

a> 

•o 

o 

Z, 

o 

o 

<s 

o 

o 

o 

o 

VO 

o 

o 

00 

• 

a> 

N 

CO 

o 

o 

o 

o 

, 

o 

o 

CO 

o 

o 

E-28 


Table  3.  Comparison  of  RELAX,  NETFTO,  and  NETSTAR  on  nxn 
Transportation  Problems  Having  10,000  Units  of  Supply 
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Table  4.  Comparison  of  RELAX,  NETFLO,  and  NETSTAR  on  nxn 
Transportation  Problems  Having  50,000  Units  of  Supply 
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Table  5.  Comparison  of  RELAX,  NETFLO,  and  NETSTAR  on  nxn 
Transportation  Problems  Having  100,000  Units  of  Supply 


o 


e 

o 

a 

•c 

w 

a> 

A 

E 

S 

z 

(S 


e 

• 

r4  VO  VO 

VO  CO  1-t 

<S  (S  wo 

o  wo 

VO 

cv) 

u 

lO  00  VO  VO 

wo  VO  O  VO 

Ov 

Ov  CS  Tl- 

cs 

00 

B 

4) 

vb  00  ov  00 

Ov  <  O 

T-J  00  C> 

d 

to 

H 

V) 

W' 

^  <s  rJ  cj 

CO  CO 

rv| 

w 

c 

o 

VO  cs  n 

r-  CO  ov  Tf 

Ov  «— t  CO  00 

wo  CO  O 

00 

VO  VO  ov 

O  VO  0\ 

o  o 

ts  Ov  Ov 

Ov 

VO 

vn  VO  »o  VO 

<M  VO  VO  VO 

wo 

Ov  wo  wo 

wo 

Ov 

o 

CM  N  r4  (S 

CS  CO  CO 

CO 

CO 

o 

•* 

00  VO  VO 

00  VO  o 

CO  VO  VO  »-H 

^  O  wo 

VO 

Ov 

Cj 

00  n  »-i 

O'!  00  wo 

Tl-  CO  O 

VO  Tf  O 

wo 

Ov 

E 

« 

c4  (si  cS  ri 

wo  wo  wo 

Tf  vb  o^  c^ 

<s  vb  »-J 

00 

00 

P 

CO  CO  CO  to 

VO  VO 

VO 

00 

(A 

B 

O 

•mm 

VO  fv|  f<) 

VO  00  wo  »-H 

00  ON  »-i 

CO  Tt  CO 

Ov 

ts 

m 

ov  ro  00 

00  Tf  Tt  o 

00  wo  o 

cvj  (S  r- 

00 

Ov  Ov  Ov 

CO  VO  VO  VO 

Ov  CO  wo  wo 

wo  CO  wo 

CO 

a> 

n  cs  n  <vi 

CO  Tt  ^ 

wo  VO  VO  VO 

VO 

wo 

o 

e 

• 

cn 

u 

00  m  »n 

VO  t"  VO 

O  O  Tt 

CO  00  O  VO 

Ov 

00  »-•  r-  VO 

rf  wo  o 

Ti-  CO 

fv|  Tt 

r* 

B 

4) 

^  CO  cri  vS 

ON  wo  »-J  Tf 

O  Tf  O  (S 

Ov  00  wo 

c4 

r-1  CS 

CO  VO  »-• 

O  wo  C^J 

wo 

Tj- 

V3 

6 

W 

Ov  Tf  rf 

’-H  Tf  VO  Ov 

CO  Tj"  ^  CO 

CO 

o 

VO 

Ia 

^  t  ts 

1— 1  T-l  ts 

(N 

wo 

s 

ts 

s 

M 

o 

o 

o 

o 

o  o 

o  o 

o  o 

o  o 

4> 

o  o  o 

o  o  o 

o  o  o 

o  o  o 

00 

o  o  o  o 

o  o  o  o 

o  o  o  o 

o  o  o  o 

e 

o  o  o  o 

o  o  o  o 

o  o  o  o 

o  o  o 

o 

o 

<4 

^-h 

u 

A 

1  1  1  1 

till 

1  1  1  1 

1  1  1 

1 

o  o  o  o 

o  o  o  o 

o  o  o  o 

o  o  o  o 

o 

o 

o 

o 

M 

o 

o 

o 

o 

H 

o 

o 

o 

o 

< 

o 

o 

o 

Ov 

o 

VO 

s 

9i 

4) 

•o 

p 

o 

o 

o 

o 

H 

o 

o 

o 

o 

1 

<s 

'it- 

VO 

00 

4) 

o 

o 

o 

o 

o 

o 

o 

<N 

o 

CO 

S-31 


I 


!e  6.  Comparison  of  NETSTAR  and  SAP_T  on  nxn 

Transportation  Problems  Having  1,000  Units  of  Supply 
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TOTAL  37719  286.30  1254  148.14 


Table  7.  Comparison  of  NETSTAR  and  SAP_T  on  nxn 

Transportation  Problems  Having  5,000  Units  of  Supply 
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Table  8.  Comparison  of  NETSTAR  and  SAP_T  on  nxn 

Transportation  Problems  Having  10,000  Units  of  Supply 
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TOTAL  31873  273.98  1438  414.22 


Table  9.  Comparison  of  NETSTAR  and  SAP_T  on  nxn 

Transportation  Problems  Having  50,000  Units  of  Supply 
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TOTAL  31359  258.86  1434  512.17 


Table  10.  Comparison  of  NETSTAR  and  SAP_T  on  nxn 

Transportation  Problems  Having  100,000  Units  of  Supply 
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Table  11.  Comparison  of  NETSTAR,  SAP_T,  and  SAP_S_T  on  nxn 
Transportation  Problems  Having  1,000  Units  of  Supply 
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Table  13.  Comparison  of  NETSTAR,  SAP_T,  and  SAP_S_T  on  nxn 
Transportation  Problems  Having  10,000  Units  of  Supply 
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